Comparison of Various Feature Selection Algorithms in Speech Emotion Recognition
Main Article Content
Abstract
Speech Emotion Recognition (SER) plays a predominant role in human-machine interaction. SER is a challenging task because of number of complexities involved in it. For an accurate emotion classification system, feature extraction is the first and important step carried out on speech signals. And after the features are extracted, it is very important to select the best features out of all and reject the redundant and least important features. Feature selection methods play an important role in SER performance. The classifier gets the selected features, so as to reduce the unnecessary overload and perform better to classify the emotions. In this study, a good combination of features is selected from Punjabi Emotional Speech Database. Then a number of feature selection algorithms are explored and experimented upon, to select the best features. 1D-CNN is used for classification purpose. The results are shown and compared on the basis of number of performance metrics. LASSO has shown the best performance results as compared to other feature selection methods.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
AJSE contents are under the terms of the Creative Commons Attribution License. This permits anyone to copy, distribute, transmit and adapt the worknon-commercially provided the original work and source is appropriately cited.
References
[2] M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion recognition: Features, classification schemes, and databases,” Pattern Recognition, vol. 44, no. 3, pp. 572–587, 2011, doi: 10.1016/j.patcog.2010.09.020.
[3] J. Nicholson, K. Takahashi, and R. Nakatsu, “Emotion Recognition in Speech Using Neural Networks,” Neural Computing & Applications, vol. 9, no. 4, pp. 290–296, 2000, doi: 10.1007/s005210070006.
[4] T. Özseven, “A novel feature selection method for speech emotion recognition,” Applied Acoustics, vol. 146, pp. 320–326, 2019, doi: 10.1016/j.apacoust.2018.11.028.
[5] L. Kerkeni, Y. Serrestou, K. Raoof, M. Mbarki, M. A. Mahjoub, and C. Cleder, “Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO,” Speech Communication, vol. 114, pp. 22–35, 2019, doi: 10.1016/j.specom.2019.09.002.
[6] S. Kuchibhotla, H. Deepthi, V. Koteswara, and R. Anne, “An optimal two stage feature selection for speech emotion recognition using acoustic features,” International Journal of Speech Technology, vol. 19, no. 4, pp. 657–667, 2016, doi: 10.1007/s10772-016-9358-0.
[7] L. Bankert, “Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison,” 1994.
[8] K. Kaur and P. Singh, “Punjabi Emotional Speech Database: Design, Recording and Verification,” International Journal of Intelligent Systems and Applications in Engineering, vol. 9, no. 4, Dec. 2021, doi: 10.18201/ijisae.2021473641.
[9] M. Swain, A. Routray, and P. Kabisatpathy, “Databases, features and classifiers for speech emotion recognition: a review,” International Journal of Speech Technology, vol. 21, no. 1, pp. 93–120, 2018, doi: 10.1007/s10772-018-9491-z.
[10] S. Bhattacharyya et al., “Speech Background Noise Removal Using Different Linear Filtering Techniques,” Lecture Notes in Electrical Engineering, vol. 475, 2018, doi: 10.1007/978-981-10-8240-5.
[11] M. Ma, M. Wang, and J. Hu, “Research on adaptive acoustic echo cancellation algorithm in digital hearing AIDS,” AIP Conference Proceedings, vol. 1864, 2017, doi: 10.1063/1.4992990.
[12] S. Vihari, A. S. Murthy, P. Soni, and D. C. Naik, “Comparison of Speech Enhancement Algorithms,” Procedia Computer Science, vol. 89, pp. 666–676, 2016, doi: 10.1016/j.procs.2016.06.032.
[13] T. M., A. Adeel, and A. Hussain, “A Survey on Techniques for Enhancing Speech,” International Journal of Computer Applications, vol. 179, no. 17, pp. 1–14, 2018, doi: 10.5120/ijca2018916290.
[14] J. Rong, G. Li, and Y. P. P. Chen, “Acoustic feature selection for automatic emotion recognition from speech,” Information Processing and Management, vol. 45, no. 3, pp. 315–328, 2009, doi: 10.1016/j.ipm.2008.09.003.
[15] K. S. Rao and B. Yegnanarayana, “Prosody modification using instants of significant excitation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 3, pp. 972–980, 2006, doi: 10.1109/TSA.2005.858051.
[16] C. N. Anagnostopoulos, T. Iliou, and I. Giannoukos, “Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011,” Artificial Intelligence Review, vol. 43, no. 2, pp. 155–177, 2012, doi: 10.1007/s10462-012-9368-5.
[17] F. Pyrczak, D. M. Oh, F. Pyrczak, and D. M. Oh, “Introduction to the t Test,” 2019. doi: 10.4324/9781315179803-28.
[18] M. Sheikhan, M. Bejani, and D. Gharavian, “Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method,” Neural Computing and Applications, vol. 23, no. 1, pp. 215–227, 2013, doi: 10.1007/s00521-012-0814-8.
[19] X. Chen and J. C. Jeong, “Enhanced recursive feature elimination,” in Sixth International Conference on Machine Learning and Applications (ICMLA 2007), Jan. 2007, pp. 429–435, doi: 10.1109/ICMLA.2007.35.
[20] V. Fonti, “Feature Selection using LASSO,” VU Amsterdam, pp. 1–26, 2017.
[21] C. J. C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121–167, 1998, doi: 10.1023/A:1009715923555.
[22] Y. Chen and J. Xie, “Emotional speech recognition based on SVM with GMM supervector,” Journal of Electronics (China), vol. 29, 2012, doi: 10.1007/s11767-012-0871-2.
[23] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989, doi: 10.1109/5.18626.
[24] R. Manjunath, “Dimensionality reduction and classification of color features data using svm and knn,” International Journal of Image Processing and Visual Communication, vol. 1, pp. 16–21, Jan. 2013.
[25] K. O’Shea and R. Nash, “An Introduction to Convolutional Neural Networks,” ArXiv e-prints, Nov. 2015.
[26] J. Wu, “Introduction to Convolutional Neural Networks,” in Introduction to Convolutional Neural Networks, 2017, pp. 1–31.
[27] A. Khan and muhammad Islam, Deep Belief Networks. 2016.
[28] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997, doi: 10.1162/neco.1997.9.8.1735.
[29] A. Graves, A. Mohamed, and G. Hinton, “Speech Recognition With Deep Recurrent Neural Networks,” Icassp, no. 3, pp. 6645–6649, 2013, doi: 10.1109/ICASSP.2013.6638947.
[30] J. Zhao, X. Mao, and L. Chen, “Speech emotion recognition using deep 1D & 2D CNN LSTM networks,” Biomedical Signal Processing and Control, vol. 47, pp. 312–323, 2019, doi: 10.1016/j.bspc.2018.08.035.
[31] Y. Zhang, J. Gao, and H. Zhou, “Breeds Classification with Deep Convolutional Neural Network,” PervasiveHealth: Pervasive Computing Technologies for Healthcare, pp. 145–151, 2020, doi: 10.1145/3383972.3383975.