Assessing costa rican children speech recognition by humans and machines

Main Article Content

Abstract

In recent years, an increasing number of studies on human-computer interaction is taking place, due to the pervasive speech interfaces implemented in systems such as cell phones, personal and home automation assistants. These studies include automatic speech recognition (ASR) and speech synthesis, and are considering a wider variety of conditions of the signals, such as noise and reverberation, and accents and age-related effects as well. For example, one of the key challenges is the development of ASR for children’s speech. Since the current systems have a dependency on language and accents, thus, to improve it, the investigations of speech recognition technologies suitable for children are needed. In this paper, we assess commercial ASR systems for the recognition of Costa Rican children’s speech, for users with ages ranging between three and fourteen years old. To establish a comparison and numeric validation of the ASR systems in recognizing children’s isolated words, we conducted a large subjective listening test that computes the differences and challenges that remains for the state-of-the art ASR systems. The results provide evident numeric differences between ASR systems and human perceptions, especially for younger children. Additionally, we provide suggestions for future research directions in the field.

Article Details

How to Cite
Maribel, & Marvin. (2022). Assessing costa rican children speech recognition by humans and machines. Tecnología En Marcha Journal, 35(8), Pág. 74–82. https://doi.org/10.18845/tm.v35i8.6453
Section
Artículo científico

References

Gerosa, Matteo, et al. “A review of ASR technologies for children’s speech”. Proceedings of the 2nd Workshop on Child, Computer and Interaction. 2009.

Russell, Martin, Shona D’Arcy, and Lit Ping Wong. “Recognition of read and spontaneous children’s speech using two new corpora”. Eighth International Conference on Spoken Language Processing. 2004.

Li, Qun, and Martin J. Russell. “An analysis of the causes of increased error rates in children’s speech recognition”. Seventh International Conference on Spoken Language Processing. 2002.

Cosi, Piero, et al. “Comparing open source ASR toolkits on Italian children speech”. WOCCI. 2014.

Hämalainen, Annika, et al. “Correlating ASR errors with developmental changes in speech production: A study of 3-10-year-old European Portuguese children’s speech”. 2014.

Adi, Derry Pramono, Agustinus Bimo Gumelar, and Ralin Pramasuri Arta Meisa. “Interlanguage of Automatic Speech Recognition. “2019 International Seminar on Application for Technology of Information and Communication (iSemantic). IEEE, 2019.

Moussalli, Souheila, and Walcir Cardoso. “Intelligent personal assistants: can they understand and be understood by accented L2 learners?”. Computer Assisted Language Learning (2019): 1-26.

Lee, Sungbok, Alexandros Potamianos, and Shrikanth Narayanan. “Acoustics of children’s speech: Developmental changes of temporal and spectral parameters”. The Journal of the Acoustical Society of America 105.3 (1999): 1455-1468.

Kennedy, James, et al. “Child speech recognition in human-robot interaction: evaluations and recommendations”. Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. 2017.

D’Arcy, Shona, and Martin Russell. “A comparison of human and computer recognition accuracy for children’s speech”. Ninth European Conference on Speech Communication and Technology. 2005.

Kruijff-Korbayov´a, Ivana, et al. “Spoken language processing in a conversational system for child-robot interaction”. Third Workshop on Child, Computer and Interaction. 2012.

Vogt, Paul, et al. “Child-robot interactions for second language tutoring to preschool children”. Frontiers in human neuroscience 11 (2017): 73.

Hämalainen, Annika, et al. “A multimodal educational game for 3-10-year-old children: collecting and automatically recognising european portuguese children’s speech”. Speech and Language Technology in Education. 2013.

Elenius, Daniel, and Mats Blomberg. “Comparing speech recognition for adults and children”. Proceedings of FONETIK 2004 (2004): 156-159.

Giuliani, Diego, and Matteo Gerosa. “Investigating recognition of children’s speech”. 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03). Vol. 2. IEEE, 2003.

González, M. J. Trastornos fonológicos. Teoría y Práctica. Universidad de Málaga: Secretariado de publicaciones. España, 1989.

Ortiz Rubia, V. Procesos fonológicos de simplificaci´on. Mendoza, Universidad del Aconcagua. Facultad de Ciencias Médicas, 2007. http://bibliotecadigital.uda.edu. ar/229.

Most read articles by the same author(s)