Assessing the effectiveness of diarization algorithms in costa rican children-adult speech according to age group and gender
Main Article Content
Abstract
Speaker diarization is the task of automatically identifying speaker identities and detecting their speaking times in an audio recording. Several algorithms have shown improvements in the performance of this task during the past years. However, it still has performance challenges in interaction scenarios, such as between a child and adult, where interruptions, fillers, laughs and other elements may affect the detection and clustering of the segments.
In this work, we perform an exploratory study with two diarization algorithms in children-adult interactions within a recording studio and assess the effectiveness of the algorithms in different age groups and genders. All participants are native Costa Rican Spanish speakers. The children have ages between 3 to 14 years, and the interaction combines guided repetition of words or short phrases, as well as natural speech.
The results demonstrate how the age affects the diarization performance, both in cluster purity and speaker purity, in a direct but non-linear fashion.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.
References
Karanasou, Penny, et al. ”Speaker diarization and longitudinal linking in multi- genre broadcast data.” 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, 2015.
Meignier, Sylvain, et al. ”Step-by-step and integrated approaches in broadcast news speaker diarization.” Computer Speech & Language 20.2-3 (2006): 303-330.
Kumar, Manoj, et al. ”Improving speaker diarization for naturalistic child-adult conversational interactions using contextual information.” The Journal of the Acoustical Society of America 147.2 (2020): EL196-EL200.
Xie, Jiamin, et al. ”Multi-PLDA Diarization on Children’s Speech.” Interspeech. 2019.
Sell, Gregory, et al. ”Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge.” Interspeech. 2018.
Fujita, YusukeRao, et al. ”Meta-Learning for Robust Child-Adult Classification from Speech.” ICASSP 20202020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.
Koluguri, Nithin Rao, et al. ”Meta-Learning for Robust Child-Adult Classification from Speech.” ICASSP 20202020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.
Najafian, Maryam, and John HL Hansen. ”Speaker independent diarization for child language environment analysis using deep neural networks.” 2016 IEEE Spo- ken Language Technology Workshop (SLT). IEEE, 2016.
Zhou, Tianyan, et al. ”Speaker diarization system for autism children’s real-life audio data.” 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2016.
Karadayi, Julien, Camila Scaff, and Alejandrina Cristià. ”Diarization in Maximally Ecological Recordings: Data from Tsimane Children.” SLTU. 2018.
Gorodetski, Alex, Ilan Dinstein, and Yaniv Zigel. ”Speaker diarization during noisy clinical diagnoses of autism.” 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2019.
Sarafianos, Nikolaos, Theodoros Giannakopoulos, and Sergios Petridis. ”Audio- visual speaker diarization using fisher linear semi-discriminant analysis.” Multimedia Tools and Applications 75.1 (2016): 115-130.
Giannakopoulos, Theodoros, and Sergios Petridis. ”Fisher linear semi-discriminant analysis for speaker diarization.” IEEE transactions on audio, speech, and language processing 20.7 (2012): 1913-1922.
Chen, Liping, et al. ”On Early-stop Clustering for Speaker Diarization.” Proc. Odyssey 2020 The Speaker and Language Recognition Workshop. 2020.