Investigating the Effect of Gender on Speech Emotion Recognition Using the First 13 Mel-Frequency Cepstral Coefficients

Document Type : Original Article


1 Department of Cognitive Linguistics, Institute for Cognitive Sciences Studies (ICSS), Tehran, Iran

2 Independent Researcher, Houston, TX


Speech is the most significant form of human communications for exchanging different types of information, in which words and grammar content is just one part of the message, and other types of information like age, gender, and emotional state of the speaker is also exchanged and influence the context and meaning of the message. Speech Emotion Recognition (SER) is a qualitative study of non-verbal emotion in speech's intonation. SER has an important role in human-machine interfaces and automatic service systems. This study investigates the effect of gender on SER. In this investigation, the first thirteen Mel-frequency cepstral coefficients extracted from the audio signal of emotional speech are used along with different classification algorithms. The proposed SER algorithm is trained by 85% of samples from women and men. For the testing we used the remaining 15%. The results show slightly better accuracy in recognizing the emotions for women compared to men i.e., anger was better recognized in men, while boredom, disgust and sadness were better recognized in women.


Main Subjects