Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts
Main Article Content
Abstract
Automatic segmentation and classification of audio streams is a challenging problem, with many applications, such as indexing multi – media digital libraries, information retrieving, and the building of speech corpus or spoken corpus) for particular languages and accents. Those corpus is a database of speech audio files and the corresponding text transcriptions. Among the several steps and tasks required for any of those applications, the speaker diarization is one of the most relevant, because it pretends to find boundaries in the audio recordings according to who speaks in each fragment. Speaker diarization can be performed in a supervised or unsupervised way and is commonly applied in audios consisting of pure speech. In this work, a first annotated dataset and analysis of speaker diarization for Costa Rican radio broadcasting is performed, using two approaches: a classic one based on k-means clustering, and the more recent Fischer Semi Discriminant. We chose publicly available radio broadcast and decided to compare those systems’ applicability in the complete audio files, which also contains some segments of music and challenging acoustic conditions. Results show a dependency on the results according to the number of speakers in each broadcast, especially in the average cluster purity. The results also show the necessity of further exploration and combining with other classification and segmentation algorithms to better extract useful information from the dataset and allow further development of speech corpus.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.
References
Barras, Claude, et al. “Multistage speaker diarization of broadcast news.” IEEE Transactions on Audio, Speech, and Language Processing 14.5 (2006): 1505-1512.
Vavrek, Jozef, et al. “Classification of broadcast news audio data employing binary decision architecture.” Computing and Informatics 36.4 (2017): 857-886.
García-Romero, Daniel, et al. “Speaker diarization using deep neural network em – beddings.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017.
Theodorou, Theodoros, Iosif Mporas, and Nikos Fakotakis. “An overview of auto – matic audio segmentation.” International Journal of Information Technology and Computer Science (IJITCS) 6.11 (2014): 1.
Pleva, Matu´s, and Jozef Juh´ar. “TUKE-BNews-SK: Slovak Broadcast News Corpus Construction and Evaluation.” LREC. 2014.
Yilmaz, Emre, et al. “A longitudinal bilingual Frisian-Dutch radio broadcast database designed for code-switching research.” (2016).
Zgank, Andrej, Ana Zwitter Vitez, and Darinka Verdonik. “The Slovene BNSI Broadcast News database and reference speech corpus GOS: Towards the uniform guidelines for future work.” LREC. 2014.
Nouza, Jan, Jindrich Zdansky, and Petr Cerva. “System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search.” MELE – CON 2010-2010 15th IEEE Mediterranean Electrotechnical Conference, 2010.
Federico, Marcello, Giordani, Dimitri and Coletti Paolo. “Development And Eval – uation Of An Italian Broadcast News Corpus.” European Language Resources Association (ELRA). 2000.
Giannakopoulos, Theodoros, and Sergios Petridis. “Fisher linear semi-discriminant analysis for speaker diarization.” IEEE transactions on audio, speech, and language processing 20.7 (2012): 1913-1922.
Montazzolli, Sergio, Andre Adami, and Dante Barone. “An extension to Fisher Linear Semi-Discriminant analysis for Speaker Diarization.” 2014 International Telecommunications Symposium (ITS). IEEE, 2014.
Sarafianos, Nikolaos, Theodoros Giannakopoulos, and Sergios Petridis. “Audio – visual speaker diarization using fisher linear semi-discriminant analysis.” Multime – dia Tools and Applications 75.1 (2016): 115-130.
Welling, Max. “Fisher linear discriminant analysis”. Department of computer sci – ence, University of Toronto. Technical Report, 2005.
Giannakopoulos, Theodoros. “pyaudioanalysis: An open-source python library for audio signal analysis.” PloS one 10.12 (2015): e0144610.