Estimación de incertidumbre para un  sistema de reconocimiento de voz

Walter Morales-Muñoz; Saúl Calderón-Ramírez

doi:10.18845/tm.v37i7.7305

PDF (Español (España))

Published: sept. 9, 2024

DOI: https://doi.org/10.18845/tm.v37i7.7305

Keywords:

Uncertainty, Speech Recognition, ASR, Whisper, Monte Carlo Dropout

Walter Morales-Muñoz

Instituto tecnológico de Costa Rica

https://orcid.org/0000-0002-3888-4951

Saúl Calderón-Ramírez

Instituto tecnológico de Costa Rica

https://orcid.org/0000-0001-9993-4388

Abstract

Whisper is a voice recognition system designed by the company OpenAI, which has been
trained with 680,000 hours of multilingual and multitask supervised data collected from the web.
The following research aims to adapt and employ the Monte Carlo Dropout using audio data
labeled in Spanish and contaminated with a certain amount of noise and Levensthein distance
to estimate the score uncertainty of this system.Preliminary results show that there is a linear
relationship between uncertainty estimation and the Word Error Rate (WER) of the transcriptions.
Furthermore, it is observed that the number of insertions or omissions in the transcriptions tends
to be low.

How to Cite

Morales-Muñoz, W., & Calderón-Ramírez, S. (2024). Uncertainty estimation for a speech recognition system. Tecnología En Marcha Journal, 37(7), Pág 97–103. https://doi.org/10.18845/tm.v37i7.7305

Issue

2024: Vol. 37, special issue. IEEE International Conference on BioInspired Processing

Section

Artículo científico

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.

References

Díaz, C., Calderon-Ramirez, S., y Aguilar, L. D. M. (2022). Data quality metrics for unlabelled datasets. En 2022

ieee 4th international conference on bioinspired.

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., y Sutskever, I. (2022). Robust speech recognition

via large-scale weak supervision. arXiv preprint arXiv:2212.04356 .

Mena, J., Pujol, O., y Vitria, J. (2021). A survey on uncertainty estimation in deep learning classification systems

from a bayesian perspective. ACM Computing Surveys.

Loquercio, A., Segu, M., y Scaramuzza, D. (2020). A general framework for uncertainty estimation in deep

learning. IEEE Robotics and Automation Letters, 5 (2), 3153–3160.

Gal, Y., y Ghahramani, Z. (2016). Dropout as a bayesian approximation: Representing model uncertainty in

deep learning. En international conference on machine learning (pp. 1050–1059)

Jayashankar, T., Roux, J. L., y Moulin, P. (2020). Detecting audio attacks on asr systems with dropout uncertainty. arXiv preprint arXiv:2006.019

Article Sidebar

Main Article Content

Abstract

Article Details

References