Chi Thien Nguyen,
Ho Chi Minh City University of Technology (HCMUT), Ho Chi Minh City, Vietnam
DOI: 10.36724/2664-066X-2024-10-5-16-21
SYNCHROINFO JOURNAL. Volume 10, Number 5 (2024). P. 16-21.
Abstract
In practice, the result of signal recognition is degraded by noise. Training speech signals are usually noise-free, while testing speech signals are noisy. The presence of noise leads to a strong deviation of the spectra of the testing speech signals from the spectra of their standards in the training sample. Therefore, the quality of the recognition result against a background of noise drops sharply. The article proposes a trial amplification of the speech signal spectrum in the recognition process. A multiple algorithm for recognizing commands against a background of noise is compared with a single algorithm for recognizing speech commands. The problem of recognition of speech commands on the background noise is reviewed. The developed numerical algorithm of recognition is studied. The results of the experiments are reported on 11 speech commands from the TIDigits dataset.
Keywords: recognition of speech commands, noise, multiple algorithm
References
[1] G. K. Berdibayeva, A. N. Spirkin, O. N. Bodin and O. E. Bezborodova, “Features of Speech Commands Recognition Using an Artificial Neural Network,” 2021 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), Yekaterinburg, Russia, 2021, pp. 0157-0160, doi: 10.1109/USBEREIT51232.2021.9455111.
[2] Daniel-S. Arias-Otalora, Andrés Florez, Gerson Mellizo, C. H. Rodríguez-Garavito, E. Romero, J. A. Tumialan, “A Machine Learning Based Command Voice Recognition Interface”, Applied Computer Sciences in Engineering, vol.1685, pp.450, 2022.
[3] A. R B, V. R C, V. K, S. Chikamath, N. S R and S. Budihal, “Limited Vocabulary Speech Recognition,” 2024 3rd International Conference for Innovation in Technology (INOCON), Bangalore, India, 2024, pp. 1-5, doi: 10.1109/INOCON60754.2024.10511500.
[4] A. Kuzdeuov, S. Nurgaliyev, D. Turmakhan, N. Laiyk and H. A. Varol, “Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need,” 2023 3rd International Conference on Robotics, Automation and Artificial Intelligence (RAAI), Singapore, Singapore, 2023, pp. 286-291, doi: 10.1109/RAAI59955.2023.10601292.
[5] Aditya Kulkarni, Vaishali Jabade, Aniket Patil, “Audio Recognition Using Deep Learning for Edge Devices”, Advances in Computing and Data Sciences, vol.1614, pp.186, 2022.
[6] A. Yasmeen, F. I. Rahman, S. Ahmed and M. H. Kabir, “CSVC-Net: Code-Switched Voice Command Classification using Deep CNN-LSTM Network,” 2021 Joint 10th International Conference on Informatics, Electronics & Vision (ICIEV) and 2021 5th International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Kitakyushu, Japan, 2021, pp. 1-8, doi: 10.1109/ICIEVicIVPR52578.2021.9564183.
[7] C. T. Nguyen, “Solution of the problem of speech command recognition against a noise background,” Bulletin of Tula State University. Technical sciences. Issue 11. Tula: Tula State University Publishing House, 2013, pp. 241-250.
[8] C. T. Nguyen, “Optimization of the parameters of the heuristic model of speech signals in order to improve the quality of their recognition,” Bulletin of Tula State University. Technical sciences. 2014. Issue 1, pp. 44-50.
[9] J. Benesty et al., “Handbook of speech processing.” Springer, 2008. 1159 p.
[10] G. Leonard, G. Doddington, TIDigits [Electronic resource]. Linguistic Data Consortium, Philadelphia, 1993. URL: https://catalog.ldc.upenn.edu/LDC93S10 (date of access: 23.03.2024).
[11] H. Aghakhani et al., “Venomave: Targeted Poisoning Against Speech Recognition,” 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), Raleigh, NC, USA, 2023, pp. 404-417, doi: 10.1109/SaTML54575.2023.00035.
[12] L. Guo et al., “Transformer-Based Spiking Neural Networks for Multimodal Audiovisual Classification,” IEEE Transactions on Cognitive and Developmental Systems, vol. 16, no. 3, pp. 1077-1086, June 2024, doi: 10.1109/TCDS.2023.3327081.
[13] S. Xiang et al., “Neuromorphic Speech Recognition with Photonic Convolutional Spiking Neural Networks,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 29, no. 6: Photonic Signal Processing, pp. 1-7, Nov.-Dec. 2023, Art no. 7600507, doi: 10.1109/JSTQE.2023.3240248.
[14] K. Wojcicki, “Add noise to a signal at a prescribed SNR level,” URL: http://www.mathworks.com/matlabcentral/ (date of access: 10.03.2024).
[15] http://labrosa.ee.columbia.edu/sounds/noise/ (date of access: 15.03.2024).