Improving Speech Emotion Recognition Method of Convolutional Neural Network

  IJETT-book-cover  International Journal of Recent Engineering Science (IJRES)          
  
© 2018 by IJRES Journal
Volume-5 Issue-3
Year of Publication : 2018
Authors : ZENG Runhua, ZHANG Shuqun
  10.14445/23497157/IJRES-V5I3P101

MLA 

MLA Style: ZENG Runhua, ZHANG Shuqun  "Improving Speech Emotion Recognition Method of Convolutional Neural Network" International Journal of Recent Engineering Science 5.3(2018):1-7. 

APA Style: ZENG Runhua, ZHANG Shuqun, Improving Speech Emotion Recognition Method of Convolutional Neural Network.  International Journal of Recent Engineering Science, 5(3),1-7.

Abstract
In this paper, we studied speech emotion recognition and proposed an improved speech emotion recognition method of the convolutional neural network. Improved methods are improving the algorithm of updating convolution kernel weight and transforming the data matrix of the Mel-Frequency Cepstral Coefficients (MFCC) obtained by preprocessing the speech signal. This makes that the algorithm of updating the convolution kernel weight during the training process of traditional convolutional neural networks was related to the number of iterations and increase the difference of emotional phonetic features. Therefore this improved the expressive ability of convolutional neural networks. Experiments showed that the error recognition rate of the improved speech emotion recognition method of the convolutional neural network was about 7% lower than that of the traditional method.

Reference
[1] Anagnostopoulos C N, Iliou T, Giannoukos I. Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 [J]. Artificial Intelligence Review, 2015, 43(2): 155-177.
[2] Juang B H, Rabiner L. Mixture autoregressive hidden Markov models for speech signals [J]. Procedia Computer Science, 2015, 61(6): 328-333.
[3] Kirandeep Singh"Speech Recognition: A Review of Literature", International Journal of Engineering Trends and Technology (IJETT), V37(6),302-310 July 2016.
[4] Hu H, Xu M X, Wu W. GMM super vector-based SVM with spectral features for speech emotion recognition [C], IEEE International Conference on Acoustics, 2007: IV-413-IV-416.
[5] Lee C M, Yildirim S, Bulut M, et al. Emotion recognition based on phoneme classes [J]. Proc Icslp', 2004: 889-892.
[6] Mao Q, Dong M, Huang Z, et al. Learning salient features for speech emotion recognition using convolutional neural networks [J]. IEEE Transactions on Multimedia, 2014, 16(8): 2203-2213.
[7] Zhang B, Quan C, Ren F. Performance of convolution neural network on the recognition of speech emotion and images [C]. AIA International Advanced Information Institute. 2016: 12-21.
[8] Zheng W Q, Yu J S, Zou Y X. An experimental study of speech emotion recognition based on deep convolutional neural networks [C]. International Conference on Affective Computing and Intelligent Interaction. 2015: 827-831.
[9] Guo P. Research on emotion recognition from speech-features and models[D]. Northwestern Polytechnical University, 2007.
[10] Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences [J]. Readings in Speech Recognition, 1990, 28(4): 65-74.
[11] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups [J]. IEEE Signal Processing Magazine, 2012, 29(6): 82-97.
[12] Bengio Y, Lecun Y. Convolutional networks for images, Speech, and Time-Series[J]. 1995.
[13] Krizhevsky A, Sutskever I, Hinton G E. Image net classification with deep convolutional neural networks [J]. Advances in Neural Information Processing Systems, 2012, 25(2): 2012.
[14] Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition [J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[15] Shaveta Sharma , Parminder Singh "Speech Emotion Recognition using GFCC and BPNN", International Journal of Engineering Trends and Technology (IJETT), V18(6),321-322 Dec 2014.
[16] Vlassis N, Likas A. A greedy EM algorithm for Gaussian mixture learning [J]. Neural Processing Letters, 2002, 15(1): 77-87

Keywords
speech emotion recognition, Mel-frequency cepstral coefficients (MFCC), convolutional neural networks, recognition rate.