GAUSSIAN MIXTURE MODELING USING SHORT TIME FOURIER TRANSFORM FEATURES FOR AUDIO FINGERPRINTING (FriAmPO1)
Author(s) :
Arunan Ramalingam (Ryerson University, Canada)
Sridhar Krishnan (Ryerson University, Canada)
Abstract : In Audio fingerprinting, a song must be recognized by matching an extracted fingerprint to a database of previously computed fingerprints. One of the key issues in fingerprinting is the generation of fingerprints that provide discrimination among different songs and at the same time invariant to the distorted versions of the same song. In this paper, we evaluate various features such as spectral centroid, spectral bandwidth, spectral flatness measure, spectral crest factor, Renyi’s entropy and Mel-frequency cepstral coefficients under a large number of distortions by modeling them using Gaussian mixture models (GMM). To make the system more robust, we use the distorted versions of the audio for training. However we show that the audio fingerprints modeled using GMM are not only robust to the distortions used in training but also to distortions not used in training. By modeling audio fingerprints with GMM using spectral centroid and spectral flatness measure alone as features, we obtain a recognition performance of 99.5 %.

Menu