Speech Communication Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO

Leila Kerkeni; Youssef Serrestou; Kosai Raoof; Mohamed Mbarki; Mohamed Mahjoub; Catherine Cléder

doi:10.1016/j.specom.2019.09.002

Article Dans Une Revue Speech Communication Année : 2019

Speech Communication Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO

(1, 2) , (1) , (1) , (3) , (2) , (4)

1
2
3
4

Leila Kerkeni

Fonction : Auteur

Laboratoire d'Acoustique de l'Université du Mans

Laboratory of Advanced Technology and Intelligent Systems

Youssef Serrestou

Fonction : Auteur
PersonId : 846817

Laboratoire d'Acoustique de l'Université du Mans

Kosai Raoof

Fonction : Auteur
PersonId : 860371

Laboratoire d'Acoustique de l'Université du Mans

Mohamed Mbarki

Fonction : Auteur

Institut Supérieur des Sciences Appliquées et de Technologie de Sousse

Mohamed Mahjoub

Fonction : Auteur
PersonId : 740116
IdHAL : mohamed-mahjoub
ORCID : 0000-0002-6348-2743
IdRef : 226860027

Laboratory of Advanced Technology and Intelligent Systems

Catherine Cléder

Fonction : Auteur
PersonId : 18214
IdHAL : catherine-cleder
ORCID : 0000-0003-4104-3544
IdRef : 073451835

Centre de recherche en éducation de Nantes

Résumé

In this paper, we propose a global approach for speech emotion recognition (SER) system using empirical mode decomposition (EMD). Its use is motivated by the fact that the EMD combined with the Teager-Kaiser Energy Operator (TKEO) gives an efficient time-frequency analysis of the non-stationary signals. In this method, each signal is decomposed using EMD into oscillating components called intrinsic mode functions (IMFs). TKEO is used for estimating the time-varying amplitude envelope and instantaneous frequency of a signal that is supposed to be Amplitude Modulation-Frequency Modulation (AM-FM) signal. A subset of the IMFs was selected and used to extract features from speech signal to recognize different emotions. The main contribution of our work is to extract novel features named modulation spectral (MS) features and modulation frequency features (MFF) based on AM-FM modulation model and combined them with cepstral features. It is believed that the combination of all features will improve the performance of the emotion recognition system. Furthermore, we examine the effect of feature selection on SER system performance. For classification task, Support Vecto Machine (SVM) and Recurrent Neural Networks (RNN) are used to distinguish seven basic emotions. Two databases-the Berlin corpus, and the Spanish corpus-are used for the experiments. The results evaluated on the Spanish emotional database, using RNN classifier and a combination of all features extracted from the IMFs enhances the performance of the SER system and achieving 91.16% recognition rate. For the Berlin database, the combination of all features using SVM classifier has 86.22% recognition rate.

Mots clés

Classification Human-computer interaction Speech Emotion Recognition EMD TKEO Feature extraction Spectral modulation Feature selection Machine learning RNN SVM

Domaines

Environnements Informatiques pour l'Apprentissage Humain

Fichier principal

SpeechCommunicationArticle_vCorrigée.pdf (1.38 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Catherine Cléder : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02432524

Soumis le : mercredi 8 janvier 2020-16:25:40

Dernière modification le : jeudi 21 septembre 2023-12:05:41

Archivage à long terme le : vendredi 10 avril 2020-00:15:26

Dates et versions

hal-02432524 , version 1 (08-01-2020)

Identifiants

HAL Id : hal-02432524 , version 1
DOI : 10.1016/j.specom.2019.09.002

Citer

Leila Kerkeni, Youssef Serrestou, Kosai Raoof, Mohamed Mbarki, Mohamed Mahjoub, et al.. Speech Communication Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO. Speech Communication, 2019, 114, pp.22 - 35. ⟨10.1016/j.specom.2019.09.002⟩. ⟨hal-02432524⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES TICE CNRS UNIV-LEMANS LAUM NANTES-UNIVERSITE

114 Consultations

317 Téléchargements

Speech Communication Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager