A Comparison of Synthetic and Human Speech: an Evaluation by English as a Foreign Language Students in a Public Costa Rican University

William Charpentier-Jiménez

doi:10.18845/rc.v32i2.6988

PDF Epub

Publicado: dic. 8, 2023

DOI: https://doi.org/10.18845/rc.v32i2.6988

Palabras clave:

Inteligencia artificial, enseñanza de una lengua extranjera, educación superior, material pedagógico de escucha, texto a-voz

William Charpentier-Jiménez

Universidad de Costa Rica, Costa Rica

Resumen

El posible papel de audios texto-a-voz (TTS) para usos pedagógicos no ha sido
completamente explorado. Este estudio examina las percepciones de estudiantes de ILE acerca de las voces humanas y de inteligencia artificial. Asimismo, explora las opiniones de estudiantes sobre la instrucción de la escucha. Esta investigación se llevó a cabo de abril a setiembre de 2022 e incluyó a 36 estudiantes de ILE matriculados en un Bachillerato en Inglés o Enseñanza del Inglés en una universidad pública costarricense. Se utilizó un modelo cuantitativo de encuestas. El investigador recolectó las respuestas mediante una encuesta diseñada para recabar las percepciones del estudiantado acerca de las voces generadas por computadora, las voces humanas, y la instrucción de la escucha. Los datos fueron analizados de manera cuantitativa utilizando estadística descriptiva. El análisis de los datos indica que: 1) el estudiantado encuentra las voces humanas más atractivas que las voces generadas con inteligencia artificial; 2) el estudiantado considera las voces femeninas más atractivas que las masculinas cuando son generadas por computadora; 3) las voces generadas por inteligencia artificial comparten algunas características que el estudiantado encuentra más atractivas; y 4) las presentes políticas y materiales para la instrucción de la escucha deben ser reexaminadas en el programa de idiomas. Consistente con la literatura revisada, estos resultados demuestran que aunque las voces TTS no llaman tanto la atención del estudiantado como las voces humanas, una parte de la población considera las voces generadas por computadora interesantes. El análisis también sugiere que una parte del estudiantado no puede discernir en su totalidad entre voces humanas y generadas por computadora; por lo tanto, su uso puede ser apropiado en algunos contextos. Finalmente, los resultados confirman que las políticas y los materiales para la enseñanza de la escucha deben ser revisados para mejorar los procesos de adquisición del lenguaje del estudiantado.

Cómo citar

Charpentier-Jiménez, W. (2023). A Comparison of Synthetic and Human Speech: an Evaluation by English as a Foreign Language Students in a Public Costa Rican University . Revista Comunicación, 32(2), 41–58. https://doi.org/10.18845/rc.v32i2.6988

Número

Vol. 32 Núm. 2 (2023): Revista Comunicación 2-2023

Sección

Artículos

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Política de acceso abierto

Esta revista provee acceso libre inmediato a su contenido bajo el principio de que hacer disponible gratuitamente investigación al público apoya a un mayor intercambio de conocimiento global.

Ser una revista de acceso abierto, implica que todo el contenido es de libre acceso y sin costo alguno para el usuario o usuaria, o institución. Las personas usuarias pueden leer, descargar, copiar, distribuir, imprimir y buscar los artículos en esta revista sin pedir permiso previo del editor o el autor con fines educativos y no de lucro.

La única limitación de la reproducción y la distribución, y el único papel de los derechos de autor en este ámbito, debe ser dar a los autores el control sobre la integridad de su trabajo y el derecho a ser debidamente reconocidos y citados. (Budapest Open Access Iniciative)

LICENCIAMIENTO Y PROTECCIÓN INTELECTUAL
Todos los artículos publicados, están protegidos con una licencia Creative Commons 3.0 (Creative Commons Reconocimiento – NoComercial – SinObraDerivada) de Costa Rica. Consulte esta licencia en: http://creativecommons.org/licenses/by-nc-sa/3.0/cr/

Las licencias constituyen un complemento al derecho de autor tradicional, en los siguientes términos:

Se impide la obra derivada (es decir, no se puede alterar, transformar ni ampliar el documento).
b. Siempre debe reconocerse la autoría del documento referido.
c. Ningún documento publicado en la Revista Comunicación, puede tener fines comerciales de ninguna naturaleza.

Mediante estas licencias, la revista garantiza al autor que su obra está protegida legalmente, tanto bajo la legislación nacional como internacional. Por tal motivo, cuando sea demostrada la alteración, la modificación o el plagio parcial o total de una de las publicaciones de esta revista, la infracción será sometida a arbitraje internacional en tanto que se están violentando las normas de publicación de quienes participan en la Revista y la Revista misma. La institución afiliada a Creative Commons para la verificación en caso de daños y para la protección de dichos productos es el Instituto Tecnológico de Costa Rica, mediante la Editorial Tecnológica y la Vicerrectoría de Investigación.

Citas

Abbott, R. (2020). The Reasonable Robot: Artificial Intelligence and the Law (1st ed.). Cambridge University Press. https://doi.org/10.1017/9781108631761

Adamopoulou, E., & Moussiades, L. (2020). An Overview of Chatbot Technology. In I. Maglogiannis, L. Iliadis, & E. Pimenidis (Eds.), Artificial Intelligence Applications and Innovations (Vol. 584, pp. 373–383). Springer International Publishing. https://doi.org/10.1007/978-3-030-49186-4_31Al-Jarf, R. (2022). Text-to-speech software for promoting EFL freshman students’ decoding skills and pronunciation accuracy. Journal of Computer Science and Technology Studies, 4(2), 19-30.

Akmajian, A., Farmer, A. K., Bickmore, L., Demers, R. A., & Harnish, R. M. (Eds.). (2017). Linguistics: an introduction to language and communication (Seventh edition). The MIT Press.

Anis, M. (2023). Leveraging Artificial Intelligence for Inclusive English Language Teaching: Strategies And Implications For Learner Diversity. Journal of Multi-disciplinary Educational Research. 12(6). http://ijmer.in.doi./2023/12.06.89

Arora, V. (2022). Artificial intelligence in schools: a guide for teachers, administrators, and technology leaders. Routledge.

Bione, T., Grimshaw, J., & Cardoso, W. (2017). An evaluation of TTS as a pedagogical tool for pronunciation instruction: the ‘foreign’ language context. In K. Borthwick, L. Bradley, & S. Thouësny (Eds.), CALL in a climate of change: adapting to turbulent global conditions – short papers from EUROCALL 2017 (pp. 56–61). Research-publishing.net. https://doi.org/10.14705/rpnet.2017.eurocall2017.689

BlasterOnline. (2023). Speechelo [Computer software]. Romania. Retrieved from: https://app.blasteronline.com/speechelo/

Bouck, E. C. (2017). Assistive technology. Sage Publications.Brace, J., Brockhoff, V., Sparkes, N., & Tuckey, J. (2006). Speaking and listening map of development: addressing current literacy challenges (2nd ed). Rigby-Harcourt EducationRigby.

Brown, H. D., & Lee, H. (2015). Teaching by principles: an interactive approach to language pedagogy (Fourth edition). Pearson Education.

Burgess, S., & Head, K. (2005). How to teach for exams. Longman. Calais-Germain, B., & Germain, F. (2016). Anatomy of voice: how to enhance and project your best voice (First U.S. edition). Healing Arts Press.

Cameron, R. M. (2019). A.I. - 101: a primer on using artifical intelligence in education. publisher not identified.

Cardoso, W., Smith, G., & Garcia Fuentes, C. (2015). Evaluating text-to-speech synthesizers. Critical CALL – Proceedings of the 2015 EUROCALL Conference, Padova, Italy, 108–113. https://doi.org/10.14705/rpnet.2015.000318

Celce-Murcia, M., Brinton, D., & Goodwin, J. M. (2010). Teaching pronunciation: a course book and reference guide (2nd ed). Cambridge University Press.

Charpentier-Jiménez, W. (2019). University students´ perception of exposure to various English accents and their production. Actualidades Investigativas En Educación, 19(2), 1–27. https://doi.org/10.15517/aie.v19i2.36908

Chen, L. W., Watanabe, S., & Rudnicky, A. (2023). A vector quantized approach for text to speech synthesis on real-world spontaneous speech. arXiv preprint arXiv:2302.04215.

Cook, A. M. (2019). Assistive technologies: principles and practice (5th edition). Elsevier.

Craig, S. D., & Schroeder, N. L. (2019). Text-to-Speech Software and Learning: Investigating the Relevancy of the Voice Effect. Journal of Educational Computing Research, 57(6), 1534–1548. https://doi.org/10.1177/0735633118802877

Dell, A. G., Newton, D. A., & Petroff, J. G. (2017). Assistive technology in the classroom: enhancing the school experiences of students with disabilities (Third edition). Pearson.

Derwing, T. M., & Munro, M. J. (2015). Pronunciation fundamentals: evidence-based perspectives for L2 teaching and research. John Benjamins Publishing Company.

Dutoit, T. (1997). An introduction to text-to-speech synthesis. Kluwer Academic Publishers. Emiliani, P. L., & Association for the Advancement of Assistive Technology in Europe (Eds.). (2009). Assistive technology from adapted equipment to inclusive environments: AAATE 2009. Washington, DC : IOS Press.

Evans, G., & Blenkhorn, P. (2008). Screen Readers and Screen Magnifiers. In M. A. Hersh, M. A. Johnson, & D. Keating (Eds.), Assistive technology for visually impaired and blind people. Springer

Field, J. (2011). Psycholinguistics. In J. Simpson (Ed.), The Routledge handbook of applied linguistics (1st ed). Routledge.

Fitria, T. N. (2023). English Accent Variations of American English (Ame) and British English (Bre): An Implication in English Language Teaching. Sketch Journal: Journal of English Teaching, Literature and Linguistics, 3(1), 1-16.

Green, J. L. (2018). Assistive technology in special education: resources to support literacy, communication, and learning differences (Third edition). Prufrock Press, Inc.

Gulson, K. N., Sellar, S., & Webb, P. T. (2022). Algorithms of education: how datafication and artificial intelligence shape policy. University of Minnesota Press.

Hadfield, J., & Hadfield, C. (2008). Introduction to teaching English (1. publ). Oxford Univ. Press.

Harmer, J. (2007). How to teach English. (New ed., 6. impr). Pearson/Longman.

Harmer, J. (2013). The practice of English language teaching: with DVD (4. ed., 8. impression). Pearson Education.

Hartono, W. J., Nurfitri, N., Ridwan, R., Kase, E. B., Lake, F., & Zebua, R. S. Y. (2023). Artificial Intelligence (AI) Solutions In English Language Teaching: Teachers-Students Perceptions And Experiences. Journal on Education, 6(1), 1452-1461.

Hersh, M. A., Johnson, M. A., Keating, D., & Hoffmann, R. (Eds.). (2008). Speech, Text and Braille Conversion Technology. In Assistive technology for visually impaired and blind people. Springer.

Hillaire, G., Iniesto, F., & Rienties, B. (2019). Humanising Text-to-Speech Through Emotional Expression in Online Courses. Journal of Interactive Media in Education, 2019(1), 12. https://doi.org/10.5334/jime.519

Holmes, J. N., & Holmes, W. (2001). Speech synthesis and recognition (2nd ed). Taylor & Francis.

Honorof, D., McCullough, J., & Somerville, B. Comma Gets A Cure | IDEA: International Dialects of English Archive. https://www.dialectsarchive.com/comma-gets-a-cure

Jeste, D. V., Graham, S. A., Nguyen, T. T., Depp, C. A., Lee, E. E., & Kim, H.-C. (2020). Beyond artificial intelligence: exploring artificial wisdom. International Psychogeriatrics, 32(8), 993–1001. https://doi.org/10.1017/S1041610220000927

Kang, M., Kashiwagi, H., Treviranus, J., & Kaburagi, M. (2008). Synthetic speech in foreign language learning: an evaluation by learners. International Journal of Speech Technology, 11(2), 97–106. https://doi.org/10.1007/s10772-009-9039-3

Karpf, A. (2006). The human voice: how this extraordinary instrument reveals essential clues about who we are (1st U.S. ed). Bloomsbury Publishing.

Kent, D. (2022). Artificial intelligence in education: fundamentals for educators. Kotesol DDC.

Kindersley, D. (2023). Simply Artificial Intelligence. DK PUBLISHING.

King, M. R., & chatGPT. (2023). A Conversation on Artificial Intelligence, Chatbots, and Plagiarism in Higher Education. Cellular and Molecular Bioengineering, 16(1), 1–2. https://doi.org/10.1007/s12195-022-00754-8

Kochmar, E. (2022). Getting started with Natural Language Processing. Manning Publications.

Kumar, Y., Koul, A. & Singh, C. (2023). A deep learning approaches in text-to-speech system: a systematic review and recent research perspective. Multimed Tools Appl 82, 15171–15197 https://doi.org/10.1007/s11042-022-13943-4

Luo, B., Lau, R. Y. K., Li, C., & Si, Y. (2022). A critical review of state‐of‐the‐art chatbot designs and applications. WIREs Data Mining and Knowledge Discovery, 12(1). https://doi.org/10.1002/widm.1434

McRoy, S. (2021). Principles of natural language processing. Susan McRoy

Memon, S. A. (2020). Acoustic Correlates of the Voice Qualifiers: A Survey (arXiv:2010.15869). arXiv. https://doi.org/10.48550/arXiv.2010.15869

Mitchell, M. (2019). Artificial intelligence: a guide for thinking humans. Farrar, Straus and Giroux.

Moybeka, A. M., Syariatin, N., Tatipang, D. P., Mushthoza, D. A., Dewi, N. P. J. L., & Tineh, S. (2023). Artificial Intelligence and English Classroom: The Implications of AI Toward EFL Students’ Motivation. Edumaspul: Jurnal Pendidikan, 7(2), 2444-2454.

Narayanan, S. S., & Alwan, A. (Eds.). (2005). Text to speech synthesis: new paradigms and advances. Prentice Hall Professional Technical Reference.

Nass, C. I., & Brave, S. (2005). Wired for speech: how voice activates and advances the human-computer relationship. MIT Press.

Nation, I. S. P., & Newton, J. (2009). Teaching ESL/EFL listening and speaking. Routledge.

Norton, B., & Toohey, K. (2011). Identity, language learning, and social change. Language Teaching, 44(4), 412–446. https://doi.org/10.1017/S0261444811000309

Patel, M. F., & Jain, P. M. (2008). English language teaching: (methods, tools & techniques). Sunrise Publishers & Distributors.

Paz, K. E. D. S., Almeida, A. A., Behlau, M., & Lopes, L. W. (2022). Descritores de qualidade vocal soprosa, rugosa e saudável no senso comum. Audiology - Communication Research, 27, e2602. https://doi.org/10.1590/2317-6431-2021-2602

Raaijmakers, S. (2022). Deep learning for natural language processing. Manning Publications Co.Taylor, P. A. (2009). Text-to-speech synthesis. Cambridge University Press.

Ur, P. (2012). A course in English language teaching (2nd ed). Cambridge University Press.

Wang, C., Chen, S., Wu, Y., Zhang, Z., Zhou, L., Liu, S., & Wei, F. (2023). Neural codec language models are zero-shot text to speech synthesizers. arXiv preprint arXiv:2301.02111.

Watkins, P. (2010). Learning to teach English: a practical introduction for new teachers (Reprinted). Delta Publishing.

Barra lateral del artículo

Contenido principal del artículo

Resumen

Detalles del artículo

Política de acceso abierto

Citas