Comparison of four classifiers for speech-music discrimination: a first case study for costa rican radio broadcasting

Main Article Content

Abstract

During the past decades, a vast amount of audio data has be- come available in most languages and regions of the world. The efficient organization and manipulation of this data are important for tasks such as data classification, searching for information, diarization among many others, but also can be relevant for building corpora for training models for automatic speech recognition or building speech synthesis systems. Several of those tasks require extensive testing and data for specific languages and accents, especially when the development of communication systems with machines is a goal. In this work, we explore the application of several classifiers for the task of discriminating speech and music in Costa Rican radio broadcast. This discrimination is a first task in the exploration of a large corpus, to determine whether or not the available information is useful for particular research areas. The main contribution of this exploratory work is the general procedure and selection of algorithms for the Costa Rican radio corpus, which can lead to the extensive use of this source of data in many own applications and systems.

Article Details

How to Cite
Joseline, & Marvin. (2022). Comparison of four classifiers for speech-music discrimination: a first case study for costa rican radio broadcasting. Tecnología En Marcha Journal, 35(8), Pág. 119–127. https://doi.org/10.18845/tm.v35i8.6463
Section
Artículo científico

References

Lavner, Yizhar, and Dima Ruinskiy. ”A decision-tree-based algorithm for speech/music classification and segmentation.” EURASIP Journal on Audio, Speech, and Music Processing 2009 (2009): 1-14.

Ghosal, Arijit, and Suchibrota Dutta. ”Speech/music discrimination using per- ceptual feature.” Computational Science and Engineering: Proceedings of the In- ternational Conference on Computational Science and Engineering (Beliaghata, Kolkata, India, 4-6 October 2016). CRC Press, 2016.

Birajdar, Gajanan K., and Mukesh D. Patil. ”Speech/music classification using visual and spectral chromagram features.” Journal of Ambient Intelligence and Humanized Computing 11.1 (2020): 329-347.

Hirvonen, Toni. ”Speech/music classification of short audio segments.” 2014 IEEE International Symposium on Multimedia. IEEE, 2014.

Wu, Qiong, et al. ”A combination of data mining method with decision trees build- ing for Speech/Music discrimination.” Computer Speech & Language 24.2 (2010): 257-272.

Kang, Sang-Ick, and Sangmin Lee. ”Improvement of Speech/Music Classification for 3GPP EVS Based on LSTM.” Symmetry 10.11 (2018): 605.

Ruiz-Reyes, Nicolas, et al. ”New speech/music discrimination approach based on fundamental frequency estimation.” Multimedia Tools and Applications 41.2 (2009): 253-286.

Kim, S. B., and S. M. Lee. ”A Comparative Evaluation of Speech-Music Classi- fication Algorithms in the Noise Environment.” International Journal of Design, Analysis and Tools for Integrated Circuits and Systems 8.1 (2019): 36-37.

Saunders, John. ”Real-time discrimination of broadcast speech/music.” 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings. Vol. 2. IEEE, 1996.

Zhang, Hao, et al. ”Application of i-vector in speech and music classification.” 2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 2016.

Khonglah, Banriskhem K., and SR Mahadeva Prasanna. ”Speech/music classifica- tion using speech-specific features.” Digital Signal Processing 48 (2016): 71-83.

Kacprzak, Stanis-law, B-laz˙ej Chwie´cko, and Bartosz Zi´o-lko. ”Speech/music discrim- ination for analysis of radio stations.” 2017 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, 2017.

Tsipas, Nikolaos, et al. ”Efficient audio-driven multimedia indexing through similarity-based speech/music discrimination.” Multimedia Tools and Applications 76.24 (2017): 25603-25621.

Li, Zhitong, et al. ”Optimization of EVS speech/music classifier based on deep learning.” 2018 14th IEEE International Conference on Signal Processing (ICSP). IEEE, 2018.

Giannakopoulos, Theodoros. ”pyaudioanalysis: An open-source python library for audio signal analysis.” PloS one 10.12 (2015).

Hossan, Md Afzal, Sheeraz Memon, and Mark A. Gregory. ”A novel approach for MFCC feature extraction.” 2010 4th International Conference on Signal Processing and Communication Systems. IEEE, 2010.

Ellis, Daniel PW. ”Classifying music audio with timbral and chroma features.” (2007): 339-340.

Most read articles by the same author(s)