A Comparison of Synthetic and Human Speech: an Evaluation by English as a Foreign Language Students in a Public Costa Rican University

Main Article Content

William Charpentier-Jiménez

Abstract

The possible role of text-to-speech (TTS) audio for pedagogical purposes has not been fully explored. This study examines ESL students’ perceptions of artificial intelligence and human voices. It also explores students’ opinions on listening instruction. The investigation was conducted from April to September 2022 and involved 36 TESOL students enrolled in a BA in English or English teaching at a Costa Rican public university. It used a quantitative survey design. The researcher gathered student responses through a survey designed to collect students’ perceptions of computer-generated voices, human voices, and listening instruction. The data were quantitatively analyzed using descriptive statistics. Data analyses indicate that: 1) students find human voices more appealing than artificial intelligence voices; 2) students find female voices more appealing than male voices when a computer generates them; 3) artificial intelligence voices share some characteristics that students find more appealing; and 4) current listening instruction policies and materials should be reexamined in the language program. Consistent with the reviewed literature, these findings demonstrate that although TTS does not appeal
to students as much as human voices, a part of the population finds computergenerated voices appealing. The analysis also suggests that some students cannot fully discern between computer-generated and human voices; thus, their use may be appropriate in some contexts. Finally, these findings confirm that listening instruction policies and materials should be revised to improve students’ language acquisition processes.

Article Details

How to Cite
Charpentier-Jiménez, W. (2023). A Comparison of Synthetic and Human Speech: an Evaluation by English as a Foreign Language Students in a Public Costa Rican University . Revista Comunicación, 32(2), 41–58. https://doi.org/10.18845/rc.v32i2.6988
Section
Artículos

References

Abbott, R. (2020). The Reasonable Robot: Artificial Intelligence and the Law (1st ed.). Cambridge University Press. https://doi.org/10.1017/9781108631761

Adamopoulou, E., & Moussiades, L. (2020). An Overview of Chatbot Technology. In I. Maglogiannis, L. Iliadis, & E. Pimenidis (Eds.), Artificial Intelligence Applications and Innovations (Vol. 584, pp. 373–383). Springer International Publishing. https://doi.org/10.1007/978-3-030-49186-4_31Al-Jarf, R. (2022). Text-to-speech software for promoting EFL freshman students’ decoding skills and pronunciation accuracy. Journal of Computer Science and Technology Studies, 4(2), 19-30.

Akmajian, A., Farmer, A. K., Bickmore, L., Demers, R. A., & Harnish, R. M. (Eds.). (2017). Linguistics: an introduction to language and communication (Seventh edition). The MIT Press.

Anis, M. (2023). Leveraging Artificial Intelligence for Inclusive English Language Teaching: Strategies And Implications For Learner Diversity. Journal of Multi-disciplinary Educational Research. 12(6). http://ijmer.in.doi./2023/12.06.89

Arora, V. (2022). Artificial intelligence in schools: a guide for teachers, administrators, and technology leaders. Routledge.

Bione, T., Grimshaw, J., & Cardoso, W. (2017). An evaluation of TTS as a pedagogical tool for pronunciation instruction: the ‘foreign’ language context. In K. Borthwick, L. Bradley, & S. Thouësny (Eds.), CALL in a climate of change: adapting to turbulent global conditions – short papers from EUROCALL 2017 (pp. 56–61). Research-publishing.net. https://doi.org/10.14705/rpnet.2017.eurocall2017.689

BlasterOnline. (2023). Speechelo [Computer software]. Romania. Retrieved from: https://app.blasteronline.com/speechelo/

Bouck, E. C. (2017). Assistive technology. Sage Publications.Brace, J., Brockhoff, V., Sparkes, N., & Tuckey, J. (2006). Speaking and listening map of development: addressing current literacy challenges (2nd ed). Rigby-Harcourt EducationRigby.

Brown, H. D., & Lee, H. (2015). Teaching by principles: an interactive approach to language pedagogy (Fourth edition). Pearson Education.

Burgess, S., & Head, K. (2005). How to teach for exams. Longman. Calais-Germain, B., & Germain, F. (2016). Anatomy of voice: how to enhance and project your best voice (First U.S. edition). Healing Arts Press.

Cameron, R. M. (2019). A.I. - 101: a primer on using artifical intelligence in education. publisher not identified.

Cardoso, W., Smith, G., & Garcia Fuentes, C. (2015). Evaluating text-to-speech synthesizers. Critical CALL – Proceedings of the 2015 EUROCALL Conference, Padova, Italy, 108–113. https://doi.org/10.14705/rpnet.2015.000318

Celce-Murcia, M., Brinton, D., & Goodwin, J. M. (2010). Teaching pronunciation: a course book and reference guide (2nd ed). Cambridge University Press.

Charpentier-Jiménez, W. (2019). University students´ perception of exposure to various English accents and their production. Actualidades Investigativas En Educación, 19(2), 1–27. https://doi.org/10.15517/aie.v19i2.36908

Chen, L. W., Watanabe, S., & Rudnicky, A. (2023). A vector quantized approach for text to speech synthesis on real-world spontaneous speech. arXiv preprint arXiv:2302.04215.

Cook, A. M. (2019). Assistive technologies: principles and practice (5th edition). Elsevier.

Craig, S. D., & Schroeder, N. L. (2019). Text-to-Speech Software and Learning: Investigating the Relevancy of the Voice Effect. Journal of Educational Computing Research, 57(6), 1534–1548. https://doi.org/10.1177/0735633118802877

Dell, A. G., Newton, D. A., & Petroff, J. G. (2017). Assistive technology in the classroom: enhancing the school experiences of students with disabilities (Third edition). Pearson.

Derwing, T. M., & Munro, M. J. (2015). Pronunciation fundamentals: evidence-based perspectives for L2 teaching and research. John Benjamins Publishing Company.

Dutoit, T. (1997). An introduction to text-to-speech synthesis. Kluwer Academic Publishers. Emiliani, P. L., & Association for the Advancement of Assistive Technology in Europe (Eds.). (2009). Assistive technology from adapted equipment to inclusive environments: AAATE 2009. Washington, DC : IOS Press.

Evans, G., & Blenkhorn, P. (2008). Screen Readers and Screen Magnifiers. In M. A. Hersh, M. A. Johnson, & D. Keating (Eds.), Assistive technology for visually impaired and blind people. Springer

Field, J. (2011). Psycholinguistics. In J. Simpson (Ed.), The Routledge handbook of applied linguistics (1st ed). Routledge.

Fitria, T. N. (2023). English Accent Variations of American English (Ame) and British English (Bre): An Implication in English Language Teaching. Sketch Journal: Journal of English Teaching, Literature and Linguistics, 3(1), 1-16.

Green, J. L. (2018). Assistive technology in special education: resources to support literacy, communication, and learning differences (Third edition). Prufrock Press, Inc.

Gulson, K. N., Sellar, S., & Webb, P. T. (2022). Algorithms of education: how datafication and artificial intelligence shape policy. University of Minnesota Press.

Hadfield, J., & Hadfield, C. (2008). Introduction to teaching English (1. publ). Oxford Univ. Press.

Harmer, J. (2007). How to teach English. (New ed., 6. impr). Pearson/Longman.

Harmer, J. (2013). The practice of English language teaching: with DVD (4. ed., 8. impression). Pearson Education.

Hartono, W. J., Nurfitri, N., Ridwan, R., Kase, E. B., Lake, F., & Zebua, R. S. Y. (2023). Artificial Intelligence (AI) Solutions In English Language Teaching: Teachers-Students Perceptions And Experiences. Journal on Education, 6(1), 1452-1461.

Hersh, M. A., Johnson, M. A., Keating, D., & Hoffmann, R. (Eds.). (2008). Speech, Text and Braille Conversion Technology. In Assistive technology for visually impaired and blind people. Springer.

Hillaire, G., Iniesto, F., & Rienties, B. (2019). Humanising Text-to-Speech Through Emotional Expression in Online Courses. Journal of Interactive Media in Education, 2019(1), 12. https://doi.org/10.5334/jime.519

Holmes, J. N., & Holmes, W. (2001). Speech synthesis and recognition (2nd ed). Taylor & Francis.

Honorof, D., McCullough, J., & Somerville, B. Comma Gets A Cure | IDEA: International Dialects of English Archive. https://www.dialectsarchive.com/comma-gets-a-cure

Jeste, D. V., Graham, S. A., Nguyen, T. T., Depp, C. A., Lee, E. E., & Kim, H.-C. (2020). Beyond artificial intelligence: exploring artificial wisdom. International Psychogeriatrics, 32(8), 993–1001. https://doi.org/10.1017/S1041610220000927

Kang, M., Kashiwagi, H., Treviranus, J., & Kaburagi, M. (2008). Synthetic speech in foreign language learning: an evaluation by learners. International Journal of Speech Technology, 11(2), 97–106. https://doi.org/10.1007/s10772-009-9039-3

Karpf, A. (2006). The human voice: how this extraordinary instrument reveals essential clues about who we are (1st U.S. ed). Bloomsbury Publishing.

Kent, D. (2022). Artificial intelligence in education: fundamentals for educators. Kotesol DDC.

Kindersley, D. (2023). Simply Artificial Intelligence. DK PUBLISHING.

King, M. R., & chatGPT. (2023). A Conversation on Artificial Intelligence, Chatbots, and Plagiarism in Higher Education. Cellular and Molecular Bioengineering, 16(1), 1–2. https://doi.org/10.1007/s12195-022-00754-8

Kochmar, E. (2022). Getting started with Natural Language Processing. Manning Publications.

Kumar, Y., Koul, A. & Singh, C. (2023). A deep learning approaches in text-to-speech system: a systematic review and recent research perspective. Multimed Tools Appl 82, 15171–15197 https://doi.org/10.1007/s11042-022-13943-4

Luo, B., Lau, R. Y. K., Li, C., & Si, Y. (2022). A critical review of state‐of‐the‐art chatbot designs and applications. WIREs Data Mining and Knowledge Discovery, 12(1). https://doi.org/10.1002/widm.1434

McRoy, S. (2021). Principles of natural language processing. Susan McRoy

Memon, S. A. (2020). Acoustic Correlates of the Voice Qualifiers: A Survey (arXiv:2010.15869). arXiv. https://doi.org/10.48550/arXiv.2010.15869

Mitchell, M. (2019). Artificial intelligence: a guide for thinking humans. Farrar, Straus and Giroux.

Moybeka, A. M., Syariatin, N., Tatipang, D. P., Mushthoza, D. A., Dewi, N. P. J. L., & Tineh, S. (2023). Artificial Intelligence and English Classroom: The Implications of AI Toward EFL Students’ Motivation. Edumaspul: Jurnal Pendidikan, 7(2), 2444-2454.

Narayanan, S. S., & Alwan, A. (Eds.). (2005). Text to speech synthesis: new paradigms and advances. Prentice Hall Professional Technical Reference.

Nass, C. I., & Brave, S. (2005). Wired for speech: how voice activates and advances the human-computer relationship. MIT Press.

Nation, I. S. P., & Newton, J. (2009). Teaching ESL/EFL listening and speaking. Routledge.

Norton, B., & Toohey, K. (2011). Identity, language learning, and social change. Language Teaching, 44(4), 412–446. https://doi.org/10.1017/S0261444811000309

Patel, M. F., & Jain, P. M. (2008). English language teaching: (methods, tools & techniques). Sunrise Publishers & Distributors.

Paz, K. E. D. S., Almeida, A. A., Behlau, M., & Lopes, L. W. (2022). Descritores de qualidade vocal soprosa, rugosa e saudável no senso comum. Audiology - Communication Research, 27, e2602. https://doi.org/10.1590/2317-6431-2021-2602

Raaijmakers, S. (2022). Deep learning for natural language processing. Manning Publications Co.Taylor, P. A. (2009). Text-to-speech synthesis. Cambridge University Press.

Ur, P. (2012). A course in English language teaching (2nd ed). Cambridge University Press.

Wang, C., Chen, S., Wu, Y., Zhang, Z., Zhou, L., Liu, S., & Wei, F. (2023). Neural codec language models are zero-shot text to speech synthesizers. arXiv preprint arXiv:2301.02111.

Watkins, P. (2010). Learning to teach English: a practical introduction for new teachers (Reprinted). Delta Publishing.