bbc-vamp-plugins  1.0
Protected Member Functions | Protected Attributes
Rhythm Class Reference

Calculates rhythmic features of a signal, including onsets and tempo. More...

#include <Rhythm.h>

List of all members.

Protected Member Functions

void calculateBandFreqs ()
float halfHanning (float n)
float canny (float n)
float findRemainder (vector< int > peaks, int thisPeak)
float findTempo (vector< int > peaks)
float findMeanPeak (vector< float > signal, vector< int > peaks, int shift)
void findCorrelationPeaks (vector< float > autocor_in, float percentile_in, int windowLength_in, int shift_in, vector< int > &peaks_out, vector< int > &valleys_out)
void autocorrelation (vector< float > signal_in, int startShift_in, int endShift_in, vector< float > &autocor_out)
void findOnsetPeaks (vector< float > onset_in, int windowLength_in, vector< int > &peaks_out)
void movingAverage (vector< float > signal_in, int windowLength_in, float threshold_in, vector< float > &average_out, vector< float > &difference_out)
void normalise (vector< float > signal_in, vector< float > &normalised_out)
void halfHannConvolve (vector< vector< float > > &envelope_out)
void cannyConvolve (vector< vector< float > > envelope_in, vector< float > &onset_out)

Protected Attributes

int numBands
float * bandHighFreq
int halfHannLength
float * halfHannWindow
int cannyLength
float cannyShape
float * cannyWindow
vector< vector< float > > intensity
float threshold
int average_window
int peak_window
int max_bpm
int min_bpm

Detailed Description

Calculates rhythmic features of a signal, including onsets and tempo.

Outputs

Onset Curve
The filtered and half-wave rectified intensity of the signal, used to detect onsets.
Average
The moving average of the onset curve, plus the threshold - used for selecting where the peaks of the onset curve are.
Difference
The difference between the onset curve and its moving average. Used as the input for peak-picking.
Onset
The detected note onsets.
Average onset frequency
The mean number of onsets per minute.
Rhythm strength
The mean value of the peaks in the onset curve.
Autocorrelation
The autocorrelation of the difference curve.
Mean Correlation Peak
The mean value of the peaks in the autocorrelation.
Peak-Valley Ratio
The mean peak-valley ratio of the autocorrelation.
Tempo
The estimated tempo in beats per minute.

Parameters

Sub-bands
Number of sub-bands to divide the signal into for applying the half-hanning window. A higher increases accuracy at the cost of processing time. (default = 7)
Threshold
Amount by which to increase the moving average filter. A higher number produces fewer onsets. (default = 1.0)
Moving average window length
Length of moving average window. A higher number produces a smoother curve. (default = 200)
Onset peak window length
Length of window used to select peaks in the difference curve. (default = 6)
Minimum BPM
Minimum tempo calculated using the autocorrelation. (default = 12)
Maximum BPM
Maximum tempo calculated using the autocorrelation. (default = 300)

Description

The rhythm features are based on the features described in [1] (section 3C), combined with some techniques from [2].

Firstly the spectrum is divided into \(n\) sub-bands with the following frequency ranges.

\[ \left(0,\frac{F_s}{2^n}\right) , \left(\frac{F_s}{2^n}, \frac{F_s}{2^{n-1}}\right) , \ldots \left(\frac{F_s}{2^2}, \frac{F_s}{2^1}\right) \]

For each sub-band, the magnitude of the FFT bins are summed, producing \(n\) signals. Each of the signals are convolved with a half-hanning window, where \(L\) is set as 12.

\[ H(w) = 0.5 + 0.5\cos\left(2\pi \cdot \frac{w}{2L-1} \right) \hspace{20px} w\in[0, L-1] \]

Subsequently, each of the signals are convolved with a peak-enhancing canny window, where \(L\) is set as 12 and \(\sigma\) is set as 4.

\[ C(w) = \frac{w}{\sigma^2}e^{-\frac{w^2}{2\sigma^2}} \hspace{20px} w\in[-L,L] \]

The \(n\) signals are summed and half-wave rectified to produce the onset curve.

The moving average \(A\) of the onset curve \(O\) is produced from the mean value of a rectangular window of length \((2L+1)\), plus a threshold \(t\). The threshold and moving average window length parameters control \(t\) and \(L\) respectively.

\[ A(x) = \displaystyle\sum\limits_{y=-L}^{L} \frac{O(x+y)}{2L+1} + t \]

The difference signal is created by subtracting the moving average from the onset curve and applying half-wave rectification.

An onset is detected when a sample is the maximum within a given window of length \((2L+1)\), where \(L\) is set by the parameter onset peak window length.

The average onset frequency is the total number of onsets divided by the length of the track in minutes.

The rhythm strength is the mean value of the peaks of the onset curve (pre-averaging).

The autocorrelation is the autocorrelation of the difference signal between delays of \(\frac{60}{T_{max}}\cdot\frac{F_s}{s}\) frames and \(\frac{60}{T_{min}}\cdot\frac{F_s}{s}\) frames, where \(T_{min}\) and \(T_{max}\) are the min/max tempo in BPM and \(s\) is the step size in number of frames.

The peaks of the autocorrelation - \(P_i\) - are defined as those which are above a certain threshold, defined as the 95% confidence interval, and whose value is the maximum within a 7-sample window. The mean correleation peak is the mean value of the selected peaks, and the peak-valley ratio is the ratio between the mean correlation peak and the mean value of the valleys. A valley is defined as the minimum value between two peaks.

The tempo is defined as the maximum common divisor of the detected peaks. It is found by minimising the function below:

\[ T = \underset{P_k}{argmin} \displaystyle\sum\limits_{i=1}^{N} \left|\frac{P_i}{P_k}-\text{round}\left(\frac{P_i}{P_k}\right)\right|\]

References

[1] Lu, L., Liu, D., & Zhang, H.-J. (2006). Automatic Mood Detection and Tracking of Music Audio Signals. IEEE Transactions on Audio, Speech and Language Processing (Vol. 14, pp. 5-18).

[2] Dixon, S. (2006). Onset Detection Revisited. International Conference on Digital Audio Effects (DAFx) (pp. 133-137).


Member Function Documentation

void Rhythm::calculateBandFreqs ( ) [protected]

BBC Vamp plugin collection

Copyright (c) 2011-2013 British Broadcasting Corporation

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Member Data Documentation

int Rhythm::average_window [protected]

Length of moving average window

float* Rhythm::bandHighFreq [protected]

Upper frequency of each sub-band

int Rhythm::cannyLength [protected]

Length of canny window

float Rhythm::cannyShape [protected]

Shape of canny window

float* Rhythm::cannyWindow [protected]

Co-efficients of canny window

int Rhythm::halfHannLength [protected]

Length of half-hanning window

float* Rhythm::halfHannWindow [protected]

Co-efficients of half-hanning window

vector<vector<float> > Rhythm::intensity [protected]

Intensity value for each block

int Rhythm::max_bpm [protected]

Maximum BPM detected in autocorrelation

int Rhythm::min_bpm [protected]

Minimum BPM detected in autocorrelation

int Rhythm::numBands [protected]

Number of sub-bands

int Rhythm::peak_window [protected]

Length of peak-picking window

float Rhythm::threshold [protected]

Theshold value added to moving average


The documentation for this class was generated from the following files: