Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

Natalia Tomashenko; Yuri Khokhlov; Yannick Estève

Pré-Publication, Document De Travail Année : 2020

Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

(1) , , (2)

1
2

Natalia Tomashenko

Fonction : Auteur
PersonId : 17002
IdHAL : natalia-tomashenko
IdRef : 223393304

Laboratoire d'Informatique de l'Université du Mans

Yuri Khokhlov

Fonction : Auteur

Yannick Estève

Fonction : Auteur
PersonId : 11645
IdHAL : yannick-esteve
ORCID : 0000-0002-3656-8883
IdRef : 070531668

Laboratoire Informatique d'Avignon

Résumé

In this paper we investigate the GMM-derived (GMMD) features for adaptation of deep neural network (DNN) acoustic models. The adaptation of the DNN trained on GMMD features is done through the maximum a posteriori (MAP) adaptation of the auxiliary GMM model used for GMMD feature extraction. We explore fusion of the adapted GMMD features with conventional features, such as bottleneck and MFCC features, in two different neural network architectures: DNN and time-delay neural network (TDNN). We analyze and compare different types of adaptation techniques such as i-vectors and feature-space adaptation techniques based on maximum likelihood linear regression (fMLLR) with the proposed adaptation approach, and explore their complementarity using various types of fusion such as feature level, posterior level, lattice level and others in order to discover the best possible way of combination. Experimental results on the TED-LIUM corpus show that the proposed adaptation technique can be effectively integrated into DNN and TDNN setups at different levels and provide additional gain in recognition performance: up to 6% of relative word error rate reduction (WERR) over the strong feature-space adaptation techniques based on maximum likelihood linear regression (fMLLR) speaker adapted DNN baseline, and up to 18% of relative WERR in comparison with a speaker independent (SI) DNN baseline model, trained on conventional features. For TDNN models the proposed approach achieves up to 26% of relative WERR in comparison with a SI baseline, and up 13% in comparison with the model adapted by using i-vectors. The analysis of the adapted GMMD features from various points of view demonstrates their effectiveness at different levels.

Mots clés

Acoustic model adaptation Deep Neural Networks (DNN) Automatic Speech Recognition (ASR) Gaussian Mixture Models (GMM) Speaker adaptation GMM-derived (GMMD) features

Domaines

Informatique [cs] Informatique et langage [cs.CL] Traitement du signal et de l'image [eess.SP]

Fichier principal

MR2_for_arc_clean.pdf (2.15 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Natalia Tomashenko : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02551714

Soumis le : jeudi 23 avril 2020-09:28:44

Dernière modification le : samedi 25 avril 2020-01:19:18

Dates et versions

hal-02551714 , version 1 (23-04-2020)

Identifiants

HAL Id : hal-02551714 , version 1

Citer

Natalia Tomashenko, Yuri Khokhlov, Yannick Estève. Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models. 2020. ⟨hal-02551714⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON UNIV-LEMANS LIUM LIA

139 Consultations

258 Téléchargements

Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager