Effect of Reverberation Phenomena on Text- independent Speaker Recognition Based Deep Learning

Document Type : Original Article

Authors

1 Communications and Electronics Department Tanta High Institute of Engineering and Technology Tanta, Egypt

2 Communications and Electronics Department Faculty of Electronic Engineering,Manoufia University: Menouf, Egypt

3 Department of Robotics and intelligent machines, Faculty of artificial intelligents Kafrelsheikh University: Egypt

4 Communications and Electronics Department Faculty of Electronic Engineering,Manoufia University: Menouf, Egypt,

5 Automatic Control Department Faculty of Electronic Engineering,Manoufia University: Menouf, Egypt

6 Electrical engineering Department Faculty of Engineering,Minia University: Egypt

Abstract

Speaker recognition is one of many biometric authentications, due to its high importance in many applications of security considerations and telecommunications. The main aspiration of speaker recognition system is to know who is speaking depending on voice characteristics. Many current researches focuses on text-dependent speaker recognition which has a pre-knowledge of what utterance the speaker will say. In this paper text-independent speaker recognition system is used, where no prior knowledge is accessible in the context of the speakers’ utterances for all stages. A Convolutional Neural Network (CNN) based feature extraction is extended to a text-independent Speaker recognition task. Also the effect of reverberation on speaker recognition is addressed. All the speech signals are converted into images by obtaining their spectrograms. A proposed CNN model is presented to enhance the performance of the system in case of a reverberant signal. It depends on image processing concepts, and hence spectrograms of signals are used. The proposed model is compared with a conventional benchmark model. The performance of the recognition system is measured by the recognition rate in the case of clean and reverberant data.

Keywords


 [1] Z. Saquib , N. Salam, R. P Nair, N. Pandey and A. Joshi, “A Survey on Automatic Speaker Recognition Systems, ” Communications in Computer and Information Science, pp. 134–145, 2010.
[2] T. Barbu, “A Supervised Text-Independent Speaker Recognition Approach,” International Journal of Electronics and Communication Engineering, pp. 2726- 2730, 2007.
[3] Y. Lukic, C. Vogt, O. Durr and T. Stadelmann, “Speaker Identification and Clustering Using Convolutional Neural Networks,” IEEE international workshop on machine learning for signal processing, pp. 13–16, 2016.
[4] Yusuke Hioka, Jen W. Tang and Jacky Wan, “Effect of adding artificial reverberation to speech-like masking sound,” Applied Acoustics 114, pp 171–178, 2016.
[5] F. E. Abd El-Samie, “Information security for automatic speaker identification,” Springer briefs in electrical and computer engineering: Springer, 2011.
[6] S. S. Tirumala, S. R.  Shahamiri, A. S. Garhwal and R. Wang, “Speaker identification features extraction methods: A systematic review,” Expert Systems with Applications, PP. 250-271, 2017.
[7] Hang Su, “Combining Speech and Speaker Recognition - A Joint Modeling Approach,” Electrical Engineering and Computer Sciences, 10 August 2018.
[8] Jyoti B. Ramgire and Sumati M.Jagdale, “A Survey on Speaker Recognition With Various Feature Extraction And Classification Techniques,” International Research Journal of Engineering and Technology (IRJET), Vol. 03, pp. 709-712,  Apr-2016.
[9] Vani A. Hiremani, “Speaker Recognition: A Survey,” International Journal of Emerging Technology and Advanced Engineering, Vol. 5, pp. 325-335, July 2015.
[10] P. P. Parada, D. Sharma, P. A. Naylor and T. V. Waterschoot, “Reverberant-speech-recognition:-a-phoneme-analysis,” In Proc. 2014 IEEE Global Conf.  Signal Inf. Process, pp. 567-571, Dec. 2014.
[11] B. Yegnanarayana and P. S. Murthy, “Enhancement of reverberant speech using LP residual signal,” IEEE Trans. Speech Audio Processing, Vol. 8, pp. 267-281, 2000.
[12] M. Unoki and S. Hiramatsu, “MTF-based Method of Blind Estimation of ReverberationT in Room Acoustics” , 16th European Signal Processing Conference (EUSIPCO 2008), August 2008.
[13] D. Palaz, M. Magimai-Doss and R. Collobert, “Analysis of CNN-based Speech Recognition System using Raw Speech as Input”, INTERSPEECH, pp. 11- 15, 2015.
[14] Chunlei Zhang, Chengzhu yu and John H.L. Hansen, “An Investigation of Deep Learning Frameworks for Speaker Verification Anti-spoofing” , IEEE Journal of Selected Topics in Signal Processing, pp. 1-11, 2016.
Volume 28, ICEEM2019-Special Issue
ICEEM2019-Special Issue: 1st International Conference on Electronic Eng., Faculty of Electronic Eng., Menouf, Egypt, 7-8 Dec.
2019
Pages 19-23