Assessing the effectiveness of transfer learning strategies in BLSTM networks for speech denoising
Main Article Content
Abstract
Denoising speech signals represent a challenging task due to the increasing number of applications and technologies currently implemented in communication and portable devices. In those applications, challenging environmental conditions such as background noise, reverberation, and other sound artifacts can affect the quality of the signals. As a result, it also impacts the systems for speech recognition, speaker identification, and sound source localization, among many others. For denoising the speech signals degraded with the many kinds and possibly different levels of noise, several algorithms have been proposed during the past decades, with recent proposals based on deep learning presented as state-of-the-art, in particular those based on Long Short-Term Memory Networks (LSTM and Bidirectional-LSMT). In this work, a comparative study on different transfer learning strategies for reducing training time and increase the effectiveness of this kind of network is presented. The reduction in training time is one of the most critical challenges due to the high computational cost of training LSTM and BLSTM. Those strategies arose from the different options to initialize the networks, using clean or noisy information of several types. Results show the convenience of transferring information from a single case of denoising network to the rest, with a significant reduction in training time and denoising capabilities of the BLSTM networks.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.
References
Weninger, F., Watanabe, S., Tachioka, Y., and Schuller, B. “Deep recurrent de- noising auto-encoder and blind de-reverberation for reverberated speech recogni- tion.” IEEE ICASSP, 2014.
Donahue, Chris, Bo Li, and Rohit Prabhavalkar. “Exploring speech enhancement with generative adversarial networks for robust speech recognition.” IEEE ICASSP, 2018.
Coto-Jiménez, Marvin, John Goddard-Close, and Fabiola Martínez-Licona. “Im- proving automatic speech recognition containing additive noise using deep denoising autoencoders of LSTM networks.” International Conference on Speech and Computer. Springer, Cham, 2016.
Abouzid, Houda, et al. “Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning.” Analog Integrated Circuits and Signal Processing 100.3 (2019): 501-512.
Ling, Zhang. ”An Acoustic Model for English Speech Recognition Based on Deep Learning.” 2019 11th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA). IEEE, 2019.
Coto-Jiménez, M.; Goddard-Close, J.; Di Persia, L.; Rufiner, H.L. “Hybrid Speech Enhancement with Wiener filters and Deep LSTM Denoising Autoencoders.” In Proceedings of the 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), San Carlos, CA, USA, 18–20 July 2018; pp. 1–8.
González-Salazar, Astryd, Michelle Gutiérrez-Muñoz, and Marvin Coto-Jiménez. ”Enhancing Speech Recorded from a Wearable Sensor Using a Collection of Autoencoders.” Latin American High Performance Computing Conference. Springer, Cham, 2019.
Gutiérrez-Muñoz, Michelle, Astryd González-Salazar, and Marvin Coto-Jiménez. “Evaluation of Mixed Deep Neural Networks for Reverberant Speech Enhancement.” Biomimetics 5.1 (2020): 1
Tkachenko, Maxim, et al. “Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks.” International Conference on Speech and Com- puter. Springer, Cham, 2017.
Liu, Ming, et al. “Speech Enhancement Method Based On LSTM Neural Net- work for Speech Recognition.” 2018 14th IEEE International Conference on Signal Processing (ICSP). IEEE, 2018.
Weiss, Karl, Taghi M. Khoshgoftaar, and DingDing Wang. “A survey of transfer learning.” Journal of Big Data 3.1 (2016): 9.
Song, Guangxiao, et al. “Transfer Learning for Music Genre Classification.” Inter- national Conference on Intelligence Science. Springer, Cham, 2017.
Yeom-Song, Víctor, Marisol Zeledón-Córdoba, and Marvin Coto-Jiménez. ”A Per- formance Evaluation of Several Artificial Neural Networks for Mapping Speech Spectrum Parameters