Accéder directement au contenu Accéder directement à la navigation
Communication dans un congrès

Perception of expressivity in TTS: linguistics, phonetics or prosody?

Marie Tahon 1 Gwénolé Lecorvé 1 Damien Lolive 1 Raheel Qader 1
1 EXPRESSION - Expressiveness in Human Centered Data/Media
UBS - Université de Bretagne Sud, IRISA-D6 - MEDIA ET INTERACTIONS
Abstract : Actually a lot of work on expressive speech focus on acoustic models and prosody variations. However, in expressive Text-to-Speech (TTS) systems, prosody generation strongly relies on the sequence of phonemes to be expressed and also to the words below these phonemes. Consequently, linguistic and phonetic cues play a significant role in the perception of expressivity. In previous works, we proposed a statistical corpus-specific framework which adapts phonemes derived from an automatic phonetizer to the phonemes as labelled in the TTS speech corpus. This framework allows to synthesize good quality but neutral speech samples. The present study goes further in the generation of expressive speech by predicting not only corpus-specific but also expressive pronunciation. It also investigates the shared impacts of linguistics, phonetics and prosody, these impacts being evaluated through different French neutral and expressive speech collected with different speaking styles and linguistic content and expressed under diverse emotional states. Perception tests show that expressivity is more easily perceived when linguistics , phonetics and prosody are consistent. Linguistics seems to be the strongest cue in the perception of expressivity, but phonetics greatly improves expressiveness when combined with and adequate prosody.
Type de document :
Communication dans un congrès
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger

https://hal-univ-lemans.archives-ouvertes.fr/hal-01623916
Contributeur : Marie Tahon <>
Soumis le : lundi 10 septembre 2018 - 12:07:17
Dernière modification le : vendredi 10 juillet 2020 - 16:24:31
Archivage à long terme le : : mardi 11 décembre 2018 - 14:22:39

Fichier

SLSP2017_Tahon_final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Marie Tahon, Gwénolé Lecorvé, Damien Lolive, Raheel Qader. Perception of expressivity in TTS: linguistics, phonetics or prosody?. Statistical Language and Speech Processing, Oct 2017, Le Mans, France. pp.262-274, ⟨10.1007/978-3-319-68456-7_22⟩. ⟨hal-01623916v3⟩

Partager

Métriques

Consultations de la notice

133

Téléchargements de fichiers

115