Assessing costa rican children speech recognition by humans and machines
Main Article Content
Abstract
In recent years, an increasing number of studies on human-computer interaction is taking place, due to the pervasive speech interfaces implemented in systems such as cell phones, personal and home automation assistants. These studies include automatic speech recognition (ASR) and speech synthesis, and are considering a wider variety of conditions of the signals, such as noise and reverberation, and accents and age-related effects as well. For example, one of the key challenges is the development of ASR for children’s speech. Since the current systems have a dependency on language and accents, thus, to improve it, the investigations of speech recognition technologies suitable for children are needed. In this paper, we assess commercial ASR systems for the recognition of Costa Rican children’s speech, for users with ages ranging between three and fourteen years old. To establish a comparison and numeric validation of the ASR systems in recognizing children’s isolated words, we conducted a large subjective listening test that computes the differences and challenges that remains for the state-of-the art ASR systems. The results provide evident numeric differences between ASR systems and human perceptions, especially for younger children. Additionally, we provide suggestions for future research directions in the field.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.
References
Gerosa, Matteo, et al. “A review of ASR technologies for children’s speech”. Proceedings of the 2nd Workshop on Child, Computer and Interaction. 2009.
Russell, Martin, Shona D’Arcy, and Lit Ping Wong. “Recognition of read and spontaneous children’s speech using two new corpora”. Eighth International Conference on Spoken Language Processing. 2004.
Li, Qun, and Martin J. Russell. “An analysis of the causes of increased error rates in children’s speech recognition”. Seventh International Conference on Spoken Language Processing. 2002.
Cosi, Piero, et al. “Comparing open source ASR toolkits on Italian children speech”. WOCCI. 2014.
Hämalainen, Annika, et al. “Correlating ASR errors with developmental changes in speech production: A study of 3-10-year-old European Portuguese children’s speech”. 2014.
Adi, Derry Pramono, Agustinus Bimo Gumelar, and Ralin Pramasuri Arta Meisa. “Interlanguage of Automatic Speech Recognition. “2019 International Seminar on Application for Technology of Information and Communication (iSemantic). IEEE, 2019.
Moussalli, Souheila, and Walcir Cardoso. “Intelligent personal assistants: can they understand and be understood by accented L2 learners?”. Computer Assisted Language Learning (2019): 1-26.
Lee, Sungbok, Alexandros Potamianos, and Shrikanth Narayanan. “Acoustics of children’s speech: Developmental changes of temporal and spectral parameters”. The Journal of the Acoustical Society of America 105.3 (1999): 1455-1468.
Kennedy, James, et al. “Child speech recognition in human-robot interaction: evaluations and recommendations”. Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. 2017.
D’Arcy, Shona, and Martin Russell. “A comparison of human and computer recognition accuracy for children’s speech”. Ninth European Conference on Speech Communication and Technology. 2005.
Kruijff-Korbayov´a, Ivana, et al. “Spoken language processing in a conversational system for child-robot interaction”. Third Workshop on Child, Computer and Interaction. 2012.
Vogt, Paul, et al. “Child-robot interactions for second language tutoring to preschool children”. Frontiers in human neuroscience 11 (2017): 73.
Hämalainen, Annika, et al. “A multimodal educational game for 3-10-year-old children: collecting and automatically recognising european portuguese children’s speech”. Speech and Language Technology in Education. 2013.
Elenius, Daniel, and Mats Blomberg. “Comparing speech recognition for adults and children”. Proceedings of FONETIK 2004 (2004): 156-159.
Giuliani, Diego, and Matteo Gerosa. “Investigating recognition of children’s speech”. 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03). Vol. 2. IEEE, 2003.
González, M. J. Trastornos fonológicos. Teoría y Práctica. Universidad de Málaga: Secretariado de publicaciones. España, 1989.
Ortiz Rubia, V. Procesos fonológicos de simplificaci´on. Mendoza, Universidad del Aconcagua. Facultad de Ciencias Médicas, 2007. http://bibliotecadigital.uda.edu. ar/229.