Perception of expressivity in TTS: linguistics, phonetics or prosody? - Le Mans Université Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

Perception of expressivity in TTS: linguistics, phonetics or prosody?

Résumé

Actually a lot of work on expressive speech focus on acoustic models and prosody variations. However, in expressive Text-to-Speech (TTS) systems, prosody generation strongly relies on the sequence of phonemes to be expressed and also to the words below these phonemes. Consequently, linguistic and phonetic cues play a significant role in the perception of expressivity. In previous works, we proposed a statistical corpus-specific framework which adapts phonemes derived from an automatic phonetizer to the phonemes as labelled in the TTS speech corpus. This framework allows to synthesize good quality but neutral speech samples. The present study goes further in the generation of expressive speech by predicting not only corpus-specific but also expressive pronunciation. It also investigates the shared impacts of linguistics, phonetics and prosody, these impacts being evaluated through different French neutral and expressive speech collected with different speaking styles and linguistic content and expressed under diverse emotional states. Perception tests show that expressivity is more easily perceived when linguistics , phonetics and prosody are consistent. Linguistics seems to be the strongest cue in the perception of expressivity, but phonetics greatly improves expressiveness when combined with and adequate prosody.
Fichier principal
Vignette du fichier
SLSP2017_Tahon_final.pdf (421.91 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01623916 , version 1 (25-10-2017)
hal-01623916 , version 2 (10-09-2018)
hal-01623916 , version 3 (10-09-2018)

Identifiants

Citer

Marie Tahon, Gwénolé Lecorvé, Damien Lolive, Raheel Qader. Perception of expressivity in TTS: linguistics, phonetics or prosody?. Statistical Language and Speech Processing, Oct 2017, Le Mans, France. pp.262-274, ⟨10.1007/978-3-319-68456-7_22⟩. ⟨hal-01623916v3⟩
465 Consultations
393 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More