Accéder directement au contenu Accéder directement à la navigation
Nouvelle interface
Communication dans un congrès

Perception of expressivity in TTS: linguistics, phonetics or prosody?

Marie Tahon 1 Gwénolé Lecorvé 1 Damien Lolive 1 Raheel Qader 1 
1 EXPRESSION - Expressiveness in Human Centered Data/Media
UBS - Université de Bretagne Sud, IRISA-D6 - MEDIA ET INTERACTIONS
Abstract : Actually a lot of work on expressive speech focus on acoustic models and prosody variations. However, in expressive Text-to-Speech (TTS) systems, prosody generation strongly relies on the sequence of phonemes to be expressed and also to the words below these phonemes. Consequently, linguistic and phonetic cues play a significant role in the perception of expressivity. In previous works, we proposed a statistical corpus-specific framework which adapts phonemes derived from an automatic phonetizer to the phonemes as labelled in the TTS speech corpus. This framework allows to synthesize good quality but neutral speech samples. The present study goes further in the generation of expressive speech by predicting not only corpus-specific but also expressive pronunciation. It also investigates the shared impacts of linguistics, phonetics and prosody, these impacts being evaluated through different French neutral and expressive speech collected with different speaking styles and linguistic content and expressed under diverse emotional states. Perception tests show that expressivity is more easily perceived when linguistics , phonetics and prosody are consistent. Linguistics seems to be the strongest cue in the perception of expressivity, but phonetics greatly improves expressiveness when combined with and adequate prosody.
Type de document :
Communication dans un congrès
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger
Contributeur : Marie Tahon Connectez-vous pour contacter le contributeur
Soumis le : mercredi 25 octobre 2017 - 18:26:00
Dernière modification le : mardi 19 octobre 2021 - 23:58:59
Archivage à long terme le : : vendredi 26 janvier 2018 - 16:14:33


Fichiers produits par l'(les) auteur(s)




Marie Tahon, Gwénolé Lecorvé, Damien Lolive, Raheel Qader. Perception of expressivity in TTS: linguistics, phonetics or prosody?. Statistical Language and Speech Processing, Oct 2017, Le Mans, France. pp.262-274, ⟨10.1007/978-3-319-68456-7_22⟩. ⟨hal-01623916v1⟩



Consultations de la notice


Téléchargements de fichiers