bbc-vamp-plugins
1.0
|
Calculates rhythmic features of a signal, including onsets and tempo. More...
#include <Rhythm.h>
Protected Member Functions | |
void | calculateBandFreqs () |
float | halfHanning (float n) |
float | canny (float n) |
float | findRemainder (vector< int > peaks, int thisPeak) |
float | findTempo (vector< int > peaks) |
float | findMeanPeak (vector< float > signal, vector< int > peaks, int shift) |
void | findCorrelationPeaks (vector< float > autocor_in, float percentile_in, int windowLength_in, int shift_in, vector< int > &peaks_out, vector< int > &valleys_out) |
void | autocorrelation (vector< float > signal_in, int startShift_in, int endShift_in, vector< float > &autocor_out) |
void | findOnsetPeaks (vector< float > onset_in, int windowLength_in, vector< int > &peaks_out) |
void | movingAverage (vector< float > signal_in, int windowLength_in, float threshold_in, vector< float > &average_out, vector< float > &difference_out) |
void | normalise (vector< float > signal_in, vector< float > &normalised_out) |
void | halfHannConvolve (vector< vector< float > > &envelope_out) |
void | cannyConvolve (vector< vector< float > > envelope_in, vector< float > &onset_out) |
Protected Attributes | |
int | numBands |
float * | bandHighFreq |
int | halfHannLength |
float * | halfHannWindow |
int | cannyLength |
float | cannyShape |
float * | cannyWindow |
vector< vector< float > > | intensity |
float | threshold |
int | average_window |
int | peak_window |
int | max_bpm |
int | min_bpm |
Calculates rhythmic features of a signal, including onsets and tempo.
The rhythm features are based on the features described in [1] (section 3C), combined with some techniques from [2].
Firstly the spectrum is divided into \(n\) sub-bands with the following frequency ranges.
\[ \left(0,\frac{F_s}{2^n}\right) , \left(\frac{F_s}{2^n}, \frac{F_s}{2^{n-1}}\right) , \ldots \left(\frac{F_s}{2^2}, \frac{F_s}{2^1}\right) \]
For each sub-band, the magnitude of the FFT bins are summed, producing \(n\) signals. Each of the signals are convolved with a half-hanning window, where \(L\) is set as 12.
\[ H(w) = 0.5 + 0.5\cos\left(2\pi \cdot \frac{w}{2L-1} \right) \hspace{20px} w\in[0, L-1] \]
Subsequently, each of the signals are convolved with a peak-enhancing canny window, where \(L\) is set as 12 and \(\sigma\) is set as 4.
\[ C(w) = \frac{w}{\sigma^2}e^{-\frac{w^2}{2\sigma^2}} \hspace{20px} w\in[-L,L] \]
The \(n\) signals are summed and half-wave rectified to produce the onset curve.
The moving average \(A\) of the onset curve \(O\) is produced from the mean value of a rectangular window of length \((2L+1)\), plus a threshold \(t\). The threshold and moving average window length parameters control \(t\) and \(L\) respectively.
\[ A(x) = \displaystyle\sum\limits_{y=-L}^{L} \frac{O(x+y)}{2L+1} + t \]
The difference signal is created by subtracting the moving average from the onset curve and applying half-wave rectification.
An onset is detected when a sample is the maximum within a given window of length \((2L+1)\), where \(L\) is set by the parameter onset peak window length.
The average onset frequency is the total number of onsets divided by the length of the track in minutes.
The rhythm strength is the mean value of the peaks of the onset curve (pre-averaging).
The autocorrelation is the autocorrelation of the difference signal between delays of \(\frac{60}{T_{max}}\cdot\frac{F_s}{s}\) frames and \(\frac{60}{T_{min}}\cdot\frac{F_s}{s}\) frames, where \(T_{min}\) and \(T_{max}\) are the min/max tempo in BPM and \(s\) is the step size in number of frames.
The peaks of the autocorrelation - \(P_i\) - are defined as those which are above a certain threshold, defined as the 95% confidence interval, and whose value is the maximum within a 7-sample window. The mean correleation peak is the mean value of the selected peaks, and the peak-valley ratio is the ratio between the mean correlation peak and the mean value of the valleys. A valley is defined as the minimum value between two peaks.
The tempo is defined as the maximum common divisor of the detected peaks. It is found by minimising the function below:
\[ T = \underset{P_k}{argmin} \displaystyle\sum\limits_{i=1}^{N} \left|\frac{P_i}{P_k}-\text{round}\left(\frac{P_i}{P_k}\right)\right|\]
[1] Lu, L., Liu, D., & Zhang, H.-J. (2006). Automatic Mood Detection and Tracking of Music Audio Signals. IEEE Transactions on Audio, Speech and Language Processing (Vol. 14, pp. 5-18).
[2] Dixon, S. (2006). Onset Detection Revisited. International Conference on Digital Audio Effects (DAFx) (pp. 133-137).
void Rhythm::calculateBandFreqs | ( | ) | [protected] |
BBC Vamp plugin collection
Copyright (c) 2011-2013 British Broadcasting Corporation
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
int Rhythm::average_window [protected] |
Length of moving average window
float* Rhythm::bandHighFreq [protected] |
Upper frequency of each sub-band
int Rhythm::cannyLength [protected] |
Length of canny window
float Rhythm::cannyShape [protected] |
Shape of canny window
float* Rhythm::cannyWindow [protected] |
Co-efficients of canny window
int Rhythm::halfHannLength [protected] |
Length of half-hanning window
float* Rhythm::halfHannWindow [protected] |
Co-efficients of half-hanning window
vector<vector<float> > Rhythm::intensity [protected] |
Intensity value for each block
int Rhythm::max_bpm [protected] |
Maximum BPM detected in autocorrelation
int Rhythm::min_bpm [protected] |
Minimum BPM detected in autocorrelation
int Rhythm::numBands [protected] |
Number of sub-bands
int Rhythm::peak_window [protected] |
Length of peak-picking window
float Rhythm::threshold [protected] |
Theshold value added to moving average