Accéder directement au contenu Accéder directement à la navigation
Article dans une revue

Speech Communication Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO

Abstract : In this paper, we propose a global approach for speech emotion recognition (SER) system using empirical mode decomposition (EMD). Its use is motivated by the fact that the EMD combined with the Teager-Kaiser Energy Operator (TKEO) gives an efficient time-frequency analysis of the non-stationary signals. In this method, each signal is decomposed using EMD into oscillating components called intrinsic mode functions (IMFs). TKEO is used for estimating the time-varying amplitude envelope and instantaneous frequency of a signal that is supposed to be Amplitude Modulation-Frequency Modulation (AM-FM) signal. A subset of the IMFs was selected and used to extract features from speech signal to recognize different emotions. The main contribution of our work is to extract novel features named modulation spectral (MS) features and modulation frequency features (MFF) based on AM-FM modulation model and combined them with cepstral features. It is believed that the combination of all features will improve the performance of the emotion recognition system. Furthermore, we examine the effect of feature selection on SER system performance. For classification task, Support Vecto Machine (SVM) and Recurrent Neural Networks (RNN) are used to distinguish seven basic emotions. Two databases-the Berlin corpus, and the Spanish corpus-are used for the experiments. The results evaluated on the Spanish emotional database, using RNN classifier and a combination of all features extracted from the IMFs enhances the performance of the SER system and achieving 91.16% recognition rate. For the Berlin database, the combination of all features using SVM classifier has 86.22% recognition rate.
Type de document :
Article dans une revue
Liste complète des métadonnées

Littérature citée [50 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-02432524
Contributeur : Catherine Cléder <>
Soumis le : mercredi 8 janvier 2020 - 16:25:40
Dernière modification le : vendredi 10 janvier 2020 - 01:43:26
Archivage à long terme le : : vendredi 10 avril 2020 - 00:15:26

Fichier

SpeechCommunicationArticle_vCo...
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Leila Kerkeni, Youssef Serrestou, Kosai Raoof, Mohamed Mbarki, Mohamed Mahjoub, et al.. Speech Communication Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO. Speech Communication, Elsevier : North-Holland, 2019, 114, pp.22 - 35. ⟨10.1016/j.specom.2019.09.002⟩. ⟨hal-02432524⟩

Partager

Métriques

Consultations de la notice

98

Téléchargements de fichiers

216