A Comparison of Synthetic and Human Speech: an Evaluation by English as a Foreign Language Students in a Public Costa Rican University
Main Article Content
Abstract
The possible role of text-to-speech (TTS) audio for pedagogical purposes has not been fully explored. This study examines ESL students’ perceptions of artificial intelligence and human voices. It also explores students’ opinions on listening instruction. The investigation was conducted from April to September 2022 and involved 36 TESOL students enrolled in a BA in English or English teaching at a Costa Rican public university. It used a quantitative survey design. The researcher gathered student responses through a survey designed to collect students’ perceptions of computer-generated voices, human voices, and listening instruction. The data were quantitatively analyzed using descriptive statistics. Data analyses indicate that: 1) students find human voices more appealing than artificial intelligence voices; 2) students find female voices more appealing than male voices when a computer generates them; 3) artificial intelligence voices share some characteristics that students find more appealing; and 4) current listening instruction policies and materials should be reexamined in the language program. Consistent with the reviewed literature, these findings demonstrate that although TTS does not appeal
to students as much as human voices, a part of the population finds computergenerated voices appealing. The analysis also suggests that some students cannot fully discern between computer-generated and human voices; thus, their use may be appropriate in some contexts. Finally, these findings confirm that listening instruction policies and materials should be revised to improve students’ language acquisition processes.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.
Política de acceso abierto
Esta revista provee acceso libre inmediato a su contenido bajo el principio de que hacer disponible gratuitamente investigación al público apoya a un mayor intercambio de conocimiento global.
Ser una revista de acceso abierto, implica que todo el contenido es de libre acceso y sin costo alguno para el usuario o usuaria, o institución. Las personas usuarias pueden leer, descargar, copiar, distribuir, imprimir y buscar los artículos en esta revista sin pedir permiso previo del editor o el autor con fines educativos y no de lucro.
La única limitación de la reproducción y la distribución, y el único papel de los derechos de autor en este ámbito, debe ser dar a los autores el control sobre la integridad de su trabajo y el derecho a ser debidamente reconocidos y citados. (Budapest Open Access Iniciative)
LICENCIAMIENTO Y PROTECCIÓN INTELECTUAL
Todos los artículos publicados, están protegidos con una licencia Creative Commons 3.0 (Creative Commons Reconocimiento – NoComercial – SinObraDerivada) de Costa Rica. Consulte esta licencia en: http://creativecommons.org/licenses/by-nc-sa/3.0/cr/
Las licencias constituyen un complemento al derecho de autor tradicional, en los siguientes términos:
- Se impide la obra derivada (es decir, no se puede alterar, transformar ni ampliar el documento).
b. Siempre debe reconocerse la autoría del documento referido.
c. Ningún documento publicado en la Revista Comunicación, puede tener fines comerciales de ninguna naturaleza.
Mediante estas licencias, la revista garantiza al autor que su obra está protegida legalmente, tanto bajo la legislación nacional como internacional. Por tal motivo, cuando sea demostrada la alteración, la modificación o el plagio parcial o total de una de las publicaciones de esta revista, la infracción será sometida a arbitraje internacional en tanto que se están violentando las normas de publicación de quienes participan en la Revista y la Revista misma. La institución afiliada a Creative Commons para la verificación en caso de daños y para la protección de dichos productos es el Instituto Tecnológico de Costa Rica, mediante la Editorial Tecnológica y la Vicerrectoría de Investigación.
References
Abbott, R. (2020). The Reasonable Robot: Artificial Intelligence and the Law (1st ed.). Cambridge University Press. https://doi.org/10.1017/9781108631761
Adamopoulou, E., & Moussiades, L. (2020). An Overview of Chatbot Technology. In I. Maglogiannis, L. Iliadis, & E. Pimenidis (Eds.), Artificial Intelligence Applications and Innovations (Vol. 584, pp. 373–383). Springer International Publishing. https://doi.org/10.1007/978-3-030-49186-4_31Al-Jarf, R. (2022). Text-to-speech software for promoting EFL freshman students’ decoding skills and pronunciation accuracy. Journal of Computer Science and Technology Studies, 4(2), 19-30.
Akmajian, A., Farmer, A. K., Bickmore, L., Demers, R. A., & Harnish, R. M. (Eds.). (2017). Linguistics: an introduction to language and communication (Seventh edition). The MIT Press.
Anis, M. (2023). Leveraging Artificial Intelligence for Inclusive English Language Teaching: Strategies And Implications For Learner Diversity. Journal of Multi-disciplinary Educational Research. 12(6). http://ijmer.in.doi./2023/12.06.89
Arora, V. (2022). Artificial intelligence in schools: a guide for teachers, administrators, and technology leaders. Routledge.
Bione, T., Grimshaw, J., & Cardoso, W. (2017). An evaluation of TTS as a pedagogical tool for pronunciation instruction: the ‘foreign’ language context. In K. Borthwick, L. Bradley, & S. Thouësny (Eds.), CALL in a climate of change: adapting to turbulent global conditions – short papers from EUROCALL 2017 (pp. 56–61). Research-publishing.net. https://doi.org/10.14705/rpnet.2017.eurocall2017.689
BlasterOnline. (2023). Speechelo [Computer software]. Romania. Retrieved from: https://app.blasteronline.com/speechelo/
Bouck, E. C. (2017). Assistive technology. Sage Publications.Brace, J., Brockhoff, V., Sparkes, N., & Tuckey, J. (2006). Speaking and listening map of development: addressing current literacy challenges (2nd ed). Rigby-Harcourt EducationRigby.
Brown, H. D., & Lee, H. (2015). Teaching by principles: an interactive approach to language pedagogy (Fourth edition). Pearson Education.
Burgess, S., & Head, K. (2005). How to teach for exams. Longman. Calais-Germain, B., & Germain, F. (2016). Anatomy of voice: how to enhance and project your best voice (First U.S. edition). Healing Arts Press.
Cameron, R. M. (2019). A.I. - 101: a primer on using artifical intelligence in education. publisher not identified.
Cardoso, W., Smith, G., & Garcia Fuentes, C. (2015). Evaluating text-to-speech synthesizers. Critical CALL – Proceedings of the 2015 EUROCALL Conference, Padova, Italy, 108–113. https://doi.org/10.14705/rpnet.2015.000318
Celce-Murcia, M., Brinton, D., & Goodwin, J. M. (2010). Teaching pronunciation: a course book and reference guide (2nd ed). Cambridge University Press.
Charpentier-Jiménez, W. (2019). University students´ perception of exposure to various English accents and their production. Actualidades Investigativas En Educación, 19(2), 1–27. https://doi.org/10.15517/aie.v19i2.36908
Chen, L. W., Watanabe, S., & Rudnicky, A. (2023). A vector quantized approach for text to speech synthesis on real-world spontaneous speech. arXiv preprint arXiv:2302.04215.
Cook, A. M. (2019). Assistive technologies: principles and practice (5th edition). Elsevier.
Craig, S. D., & Schroeder, N. L. (2019). Text-to-Speech Software and Learning: Investigating the Relevancy of the Voice Effect. Journal of Educational Computing Research, 57(6), 1534–1548. https://doi.org/10.1177/0735633118802877
Dell, A. G., Newton, D. A., & Petroff, J. G. (2017). Assistive technology in the classroom: enhancing the school experiences of students with disabilities (Third edition). Pearson.
Derwing, T. M., & Munro, M. J. (2015). Pronunciation fundamentals: evidence-based perspectives for L2 teaching and research. John Benjamins Publishing Company.
Dutoit, T. (1997). An introduction to text-to-speech synthesis. Kluwer Academic Publishers. Emiliani, P. L., & Association for the Advancement of Assistive Technology in Europe (Eds.). (2009). Assistive technology from adapted equipment to inclusive environments: AAATE 2009. Washington, DC : IOS Press.
Evans, G., & Blenkhorn, P. (2008). Screen Readers and Screen Magnifiers. In M. A. Hersh, M. A. Johnson, & D. Keating (Eds.), Assistive technology for visually impaired and blind people. Springer
Field, J. (2011). Psycholinguistics. In J. Simpson (Ed.), The Routledge handbook of applied linguistics (1st ed). Routledge.
Fitria, T. N. (2023). English Accent Variations of American English (Ame) and British English (Bre): An Implication in English Language Teaching. Sketch Journal: Journal of English Teaching, Literature and Linguistics, 3(1), 1-16.
Green, J. L. (2018). Assistive technology in special education: resources to support literacy, communication, and learning differences (Third edition). Prufrock Press, Inc.
Gulson, K. N., Sellar, S., & Webb, P. T. (2022). Algorithms of education: how datafication and artificial intelligence shape policy. University of Minnesota Press.
Hadfield, J., & Hadfield, C. (2008). Introduction to teaching English (1. publ). Oxford Univ. Press.
Harmer, J. (2007). How to teach English. (New ed., 6. impr). Pearson/Longman.
Harmer, J. (2013). The practice of English language teaching: with DVD (4. ed., 8. impression). Pearson Education.
Hartono, W. J., Nurfitri, N., Ridwan, R., Kase, E. B., Lake, F., & Zebua, R. S. Y. (2023). Artificial Intelligence (AI) Solutions In English Language Teaching: Teachers-Students Perceptions And Experiences. Journal on Education, 6(1), 1452-1461.
Hersh, M. A., Johnson, M. A., Keating, D., & Hoffmann, R. (Eds.). (2008). Speech, Text and Braille Conversion Technology. In Assistive technology for visually impaired and blind people. Springer.
Hillaire, G., Iniesto, F., & Rienties, B. (2019). Humanising Text-to-Speech Through Emotional Expression in Online Courses. Journal of Interactive Media in Education, 2019(1), 12. https://doi.org/10.5334/jime.519
Holmes, J. N., & Holmes, W. (2001). Speech synthesis and recognition (2nd ed). Taylor & Francis.
Honorof, D., McCullough, J., & Somerville, B. Comma Gets A Cure | IDEA: International Dialects of English Archive. https://www.dialectsarchive.com/comma-gets-a-cure
Jeste, D. V., Graham, S. A., Nguyen, T. T., Depp, C. A., Lee, E. E., & Kim, H.-C. (2020). Beyond artificial intelligence: exploring artificial wisdom. International Psychogeriatrics, 32(8), 993–1001. https://doi.org/10.1017/S1041610220000927
Kang, M., Kashiwagi, H., Treviranus, J., & Kaburagi, M. (2008). Synthetic speech in foreign language learning: an evaluation by learners. International Journal of Speech Technology, 11(2), 97–106. https://doi.org/10.1007/s10772-009-9039-3
Karpf, A. (2006). The human voice: how this extraordinary instrument reveals essential clues about who we are (1st U.S. ed). Bloomsbury Publishing.
Kent, D. (2022). Artificial intelligence in education: fundamentals for educators. Kotesol DDC.
Kindersley, D. (2023). Simply Artificial Intelligence. DK PUBLISHING.
King, M. R., & chatGPT. (2023). A Conversation on Artificial Intelligence, Chatbots, and Plagiarism in Higher Education. Cellular and Molecular Bioengineering, 16(1), 1–2. https://doi.org/10.1007/s12195-022-00754-8
Kochmar, E. (2022). Getting started with Natural Language Processing. Manning Publications.
Kumar, Y., Koul, A. & Singh, C. (2023). A deep learning approaches in text-to-speech system: a systematic review and recent research perspective. Multimed Tools Appl 82, 15171–15197 https://doi.org/10.1007/s11042-022-13943-4
Luo, B., Lau, R. Y. K., Li, C., & Si, Y. (2022). A critical review of state‐of‐the‐art chatbot designs and applications. WIREs Data Mining and Knowledge Discovery, 12(1). https://doi.org/10.1002/widm.1434
McRoy, S. (2021). Principles of natural language processing. Susan McRoy
Memon, S. A. (2020). Acoustic Correlates of the Voice Qualifiers: A Survey (arXiv:2010.15869). arXiv. https://doi.org/10.48550/arXiv.2010.15869
Mitchell, M. (2019). Artificial intelligence: a guide for thinking humans. Farrar, Straus and Giroux.
Moybeka, A. M., Syariatin, N., Tatipang, D. P., Mushthoza, D. A., Dewi, N. P. J. L., & Tineh, S. (2023). Artificial Intelligence and English Classroom: The Implications of AI Toward EFL Students’ Motivation. Edumaspul: Jurnal Pendidikan, 7(2), 2444-2454.
Narayanan, S. S., & Alwan, A. (Eds.). (2005). Text to speech synthesis: new paradigms and advances. Prentice Hall Professional Technical Reference.
Nass, C. I., & Brave, S. (2005). Wired for speech: how voice activates and advances the human-computer relationship. MIT Press.
Nation, I. S. P., & Newton, J. (2009). Teaching ESL/EFL listening and speaking. Routledge.
Norton, B., & Toohey, K. (2011). Identity, language learning, and social change. Language Teaching, 44(4), 412–446. https://doi.org/10.1017/S0261444811000309
Patel, M. F., & Jain, P. M. (2008). English language teaching: (methods, tools & techniques). Sunrise Publishers & Distributors.
Paz, K. E. D. S., Almeida, A. A., Behlau, M., & Lopes, L. W. (2022). Descritores de qualidade vocal soprosa, rugosa e saudável no senso comum. Audiology - Communication Research, 27, e2602. https://doi.org/10.1590/2317-6431-2021-2602
Raaijmakers, S. (2022). Deep learning for natural language processing. Manning Publications Co.Taylor, P. A. (2009). Text-to-speech synthesis. Cambridge University Press.
Ur, P. (2012). A course in English language teaching (2nd ed). Cambridge University Press.
Wang, C., Chen, S., Wu, Y., Zhang, Z., Zhou, L., Liu, S., & Wei, F. (2023). Neural codec language models are zero-shot text to speech synthesizers. arXiv preprint arXiv:2301.02111.
Watkins, P. (2010). Learning to teach English: a practical introduction for new teachers (Reprinted). Delta Publishing.