.. Évaluation-sur-la-tâche-de-détection-d-'erreurs, 100 6.2.1 Performance des embeddings acoustiques

.. Performance-des-informations-prosodiques, 102 6.2.2.1 Descripteurs prosodiques

.. Gestion-d-'erreurs-pour-la-compréhension-de-la-parole, 127 8.2.1 Module de compréhension de la parole d'un système de dialogue128 8.2.1.1 Architecture, Descripteurs, p.130

. Certaines-d-'entre and . Stoyanchev, 2014] ont proposé d'améliorer la qualité des transcriptions pour de nombreuses tâches telles que la recherche de mots clés, la compréhension de la parole et d'autres tâches nécessitant la post-édition des sorties du SRAP. D'autres études se sont intéressées à l'utilisation des réseaux de confusion (CN) issus du SRAP pour réduire le taux d'erreurs mots et calculer une mesure de confiance, 2000.

. Fusayasu, 2015] proposent une approche pour corriger d'une manière automatique des mots erronés dans les réseaux de confusions Celle-ci est fondée sur l'utilisation de descripteurs contextuels et de la distance Normalized Relevance Distance " comme une mesure de similarité sémantique entre les mots situés loin les uns des autres. Les réseaux de confusion peuvent également être utilisés pour améliorer le posttraitement des sorties du SRAP. Par exemple, ils peuvent être utilisés pour proposer d'autres hypothèses quand les transcriptions automatiques sont corrigées par un humain Cependant, les cohortes ou bins d'un réseau de confusion, i.e. ensembles des mots concurrents entre deux noeuds d'un CN, n'ont pas une taille fixe et ne contiennent parfois qu'un ou deux mots. Ceci a pour effet de réduire la possibilité d'aider un annotateur humain à corriger un mot mal reconnu puisque le nombre de mots-hypothèses alternatifs est très faible. Nous proposons d'utiliser à la fois des embeddings linguistiques et acoustiques pour enrichir a posteriori les réseaux de confusion, afin d'améliorer le post-traitement des sorties du SRAP. Pour cela, nous proposons d'enrichir les réseaux de confusion en ajoutant pour chaque mot reconnu les mots de sa liste de confusion jusqu'à obtenir pour chaque cohorte une même taille fixe, Ils sont utilisés pour représenter un ensemble de phrases alternatives et s'appuient sur les probabilités a posteriori. Les auteurs dans Pour ces expériences nous avons utilisé la liste des substitutions Sub T est ainsi que les réseaux de confusion (cohortes) correspondants produit par le SRAP LIUM. La figure 8.5 illustre le pourcentage de cohortes dans ces CNs en fonction du nombre de mots alternatifs (i.e mots en concurrence avec la meilleure hypothèse (1-best)). Les cohortes qui ont une taille entre 6 et 12 sont regroupées en une seule classe, pp.6-12, 2011.

T. Sub, avec ses voisins les plus proches dans la liste List h SimInter jusqu'à obtenir une taille de cohorte égale à 6 (si la cohorte contenait déjà au moins 6 mots alternatifs à h, elle n'est pas enrichie) Cette taille semble pertinente pour visualiser des mots alternatifs dans une interface graphique 8.3. Prédiction d'erreurs et enrichissement de réseaux de confusion 141 Automatic error region detection and characterization in LVCSR transcriptions of TV news shows, Acoustics, Speech and Signal Processing (ICASSP), 2012.

. Dufour, Characterizing and detecting spontaneous speech: Application to speaker role recognition, Speech Communication, vol.56, pp.1-18, 2014.
DOI : 10.1016/j.specom.2013.07.007

URL : https://hal.archives-ouvertes.fr/hal-01433222

S. T. Dumais, Latent semantic analysis. Annual review of information science and technology, pp.188-230, 2004.

J. L. Elman, Finding Structure in Time, Cognitive Science, vol.49, issue.2, pp.179-211, 1990.
DOI : 10.1007/BF00308682

. Erhan, Why does unsupervised pre-training help deep learning, Journal of Machine Learning Research, vol.11, issue.Feb, pp.625-660, 2010.

. Erhan, The difficulty of training deep architectures and the effect of unsupervised pre-training, AISTATS, pp.153-160, 2009.

. Estève, The EPAC Corpus : Manual and Automatic Annotations of Conversational Speech in French Broadcast News, LREC, Malta, pp.17-23, 2010.

. Estève, Integration of Word and Semantic Features for Theme Identification in Telephone Conversations, 6th International Workshop on Spoken Dialog Systems, 2015.
DOI : 10.1109/SLT.2010.5700883

. Falavigna, Acoustic and word lattice based algorithms for confidence scores, INTER- SPEECH, 2002.

. Faruqui, . Dyer, M. Faruqui, and C. Et-dyer, Community Evaluation and Exchange of Word Vectors at wordvectors.org, Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014.
DOI : 10.3115/v1/P14-5004

. Finkelstein, Placing search in context, Proceedings of the tenth international conference on World Wide Web , WWW '01, pp.406-414, 2001.
DOI : 10.1145/371920.372094

. Firat, Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint, 2016.

F. Et-ajot-]-fiscus, J. Et-ajot, and J. , The Rich Transcription, Speech-To-Text (STT) and Speaker Attributed STT (SASTT) Results, 2009.

. Fiscus, Results of the 2006 spoken term detection evaluation, Proc. SIGIR, pp.51-57, 2007.

. Freund, Experiments with a new boosting algorithm, ICML, pp.148-156, 1996.

K. Fukushima, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, vol.40, issue.4, pp.193-202, 1980.
DOI : 10.1007/BF00344251

. Fusayasu, Word-error correction of continuous speech recognition based on normalized relevance distance, IJCAI, pp.1257-1262, 2015.

M. J. Gales, Cluster adaptive training of hidden Markov models, IEEE transactions on speech and audio processing, pp.417-428, 2000.
DOI : 10.1109/89.848223

. Galibert, . Kahn, O. Galibert, and J. Et-kahn, The first official repere evaluation, SLAM@ INTERSPEECH, pp.43-48, 2013.

. Galliano, Corpus description of the Ester evaluation campaign for the rich transcription of French broadcast news, 5th international Conference on Language Resources and Evaluation (LREC), pp.315-320, 2006.

. Galliano, The ESTER phase II evaluation campaign for the rich transcription of French Broadcast News, Interspeech, pp.1149-1152, 2005.

. Galliano, The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts, Interspeech, pp.2583-2586, 2009.

. Gao, WordRep : A Benchmark for Research on Learning Word Representations Detecting trends using spearman's rank correlation coefficient, Environmental forensics, pp.359-362, 2001.

. Gauvain, Transcription de la parole conversationnelle, Traitement Automatique des Langues, issue.3, pp.4535-4582, 2005.
URL : https://hal.archives-ouvertes.fr/hal-01434260

L. Gauvain, J. Gauvain, and C. Et-lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE transactions on speech and audio processing, pp.291-298, 1994.
DOI : 10.1109/89.279278

. Hazen, Recognition confidence scoring and its use in speech understanding systems, Computer Speech & Language, vol.16, issue.1, pp.49-67, 2002.
DOI : 10.1006/csla.2001.0183

. Hermansky, . Cox, H. Hermansky, L. A. Et-cox, and J. , Perceptual Linear Predictive (PLP) Analysis-Resynthesis Technique, Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, pp.0-37, 1991.
DOI : 10.1109/ASPAA.1991.634094

H. Hillard, D. Et-ostendorf, and M. , Compensating for Word Posterior Estimation Bias in Confusion Networks, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, p.I?I, 2006.
DOI : 10.1109/ICASSP.2006.1660230

. Hinton, Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.2982-97, 2012.
DOI : 10.1109/MSP.2012.2205597

G. E. Hinton, Relaxation and its role in vision, 1978.

. Hinton, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235

. Hinton, G. E. Hinton, and R. S. Et-zemel, Autoencoders, minimum description length, and helmholtz free energy Advances in neural information processing systems, pp.3-3, 1994.

. Hirschberg, Prosodic and other cues to speech recognition failures, Speech Communication, vol.43, issue.1-2, pp.155-175, 2004.
DOI : 10.1016/j.specom.2004.01.006

. Hochreiter, . Schmidhuber, S. Hochreiter, and J. Et-schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proceedings of the national academy of sciences, pp.792554-2558, 1982.

J. J. Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons., Proceedings of the national academy of sciences, pp.813088-3092, 1984.
DOI : 10.1073/pnas.81.10.3088

. Hovy, OntoNotes, Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers on XX, NAACL '06, pp.57-60, 2006.
DOI : 10.3115/1614049.1614064

. Hubel, D. H. Hubel, and T. N. Et-wiesel, Receptive fields, binocular interaction and functional architecture in the cat's visual cortex, The Journal of Physiology, vol.160, issue.1, pp.106-154, 1962.
DOI : 10.1113/jphysiol.1962.sp006837

]. Goodfellow, Y. Bengio, and A. C. , Deep learning. Book in preparation for, 2016.

I. Goodfellow, Y. B. Courville-]-ian-goodfellow, and A. Et-courville, Deep learning. Book in preparation for, 2016.

S. Ioffe and C. Et-szegedy, Batch normalization : Accelerating deep network training by reducing internal covariate shift. arXiv preprint, 2015.

. Jaitly, Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition, INTERSPEECH. Citeseer, 2012.

S. Jalalvand and D. Et-falavigna, Stacked auto-encoder for ASR error detection and word error rate prediction, INTER- SPEECH 2015, 16th Annual Conference of the International Speech Communication Association, pp.2142-2146, 2015.

F. Jelinek, Continuous speech recognition by statistical methods, Proceedings of the IEEE, pp.532-556, 1976.
DOI : 10.1109/PROC.1976.10159

F. Jelinek, Continuous speech recognition by statistical methods, Proceedings of the IEEE, pp.532-556, 1976.
DOI : 10.1109/PROC.1976.10159

. Jelinek, Perplexity???a measure of the difficulty of speech recognition tasks, The Journal of the Acoustical Society of America, vol.62, issue.S1, pp.62-63, 1977.
DOI : 10.1121/1.2016299

H. Jiang, Confidence measures for speech recognition: A survey, Speech Communication, vol.45, issue.4, pp.455-470, 2005.
DOI : 10.1016/j.specom.2004.12.004

M. I. Jordan, Serial order : A parallel distributed processing approach Advances in psychology, pp.471-495, 1997.

S. Juan and S. Et-flora, Exploiting resources from closely-related languages for automatic speech recognition in low-resource languages from Malaysia, Thèse de doctorat, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01314120

. Kamper, Deep convolutional acoustic word embeddings using word-pair side information, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2016.7472619

S. M. Katz, Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.35, issue.3, pp.400-401, 1987.
DOI : 10.1109/TASSP.1987.1165125

R. Lleida, E. Lleida, and R. C. Et-rose, Likelihood ratio decoding and confidence measures for continuous speech recognition, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, pp.478-481, 1996.
DOI : 10.1109/ICSLP.1996.607158

. Luong, Better word representations with recursive neural networks for morphology, 2013.

. Maas, Word-level acoustic modeling with convolutional vector regression, ICML Workshop on Representation Learning, 2012.

M. Et-hinton-]-maaten, L. Hinton, and G. , Visualizing data using t-sne, Journal of Machine Learning Research, vol.9, pp.2579-2605, 2008.

. Mangu, Finding consensus in speech recognition: word error minimization and other applications of confusion networks, Computer Speech & Language, vol.14, issue.4, pp.373-400, 2000.
DOI : 10.1006/csla.2000.0152

. Marcus, Building a large annotated corpus of english : The penn treebank, Computational linguistics, vol.19, issue.2, pp.313-330, 1993.
DOI : 10.21236/ADA273556

. Markel, . Gray, J. E. Markel, and A. H. Gray, Linear Prediction of Speech, 1982.
DOI : 10.1007/978-3-642-66286-7

. Martinez, The lium asr and slt systems for iwslt 2015, 12th International Workshop on Spoken Language Translation, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01433206

J. Mauclair, Mesures de confiance en traitement automatique de la parole et applications, Thèse de doctora, 2006.

M. Et-jain-]-medsker, L. Et-jain, and L. C. , Recurrent neural networks : design and applications, 1999.

S. Meignier-et-merlin-]-meignier and T. Et-merlin, Lium spkdiarization : an open source toolkit for diarization, CMU SPUD Workshop, 2010.

R. Memisevic, Gradient-based learning of higher-order image features, 2011 International Conference on Computer Vision, pp.1591-1598, 2011.
DOI : 10.1109/ICCV.2011.6126419

. Mikolov, Efficient Estimation of Word Representations in Vector Space, Proceedings of Workshop at ICLR, 2013.

. Mikolov, Recurrent neural network based language model, Interspeech, p.3, 2010.

. Mikolov, Extensions of recurrent neural network language model, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5528-5531, 2011.
DOI : 10.1109/ICASSP.2011.5947611

. Mikolov, Extensions of recurrent neural network language model, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5528-5531, 2011.
DOI : 10.1109/ICASSP.2011.5947611

. Mikolov, Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, pp.3111-3119, 2013.

. Mikolov, Linguistic regularities in continuous space word representations, HLT-NAACL, pp.746-751, 2013.

. Mnih, . Hinton, A. Mnih, and G. Et-hinton, Three new graphical models for statistical language modelling, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.641-648, 2007.
DOI : 10.1145/1273496.1273577

A. Mnih and Y. W. Et-teh, A fast and simple algorithm for training neural probabilistic language models. arXiv preprint, 2012.

. Mohri, Weighted finite-state transducers in speech recognition, Computer Speech & Language, vol.16, issue.1, pp.69-88, 2002.
DOI : 10.1006/csla.2001.0184

. Moreau, Confidence measure and incremental adaptation for the rejection of incorrect data, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), pp.1807-1810, 2000.
DOI : 10.1109/ICASSP.2000.862105

. Moreno, A boosting approach for confidence scoring, INTERSPEECH, pp.2109-2112, 2001.

. Morin, . Bengio, F. Morin, and Y. Et-bengio, Hierarchical probabilistic neural network language model, Aistats, pp.246-252, 2005.

. Murphy, Loopy belief propagation for approximate inference : An empirical study, Proceedings Recognition (ICPR), 2012 21st International Conference on, pp.3288-3291, 1999.

. Servan, Conceptual decoding from word lattices : application to the spoken dialogue corpus media, The Ninth International Conference on Spoken Language Processing, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01160181

. Simonnet, Exploring the use of attention-based recurrent neural networks for spoken language understanding, NIPS, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01433202

. Socher, Parsing natural scenes and natural language with recursive neural networks, Proceedings of the 28th international conference on machine learning (ICML-11), pp.129-136, 2011.

. Soto, Rescoring Confusion Networks for Keyword Search, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.7088-7092, 2014.
DOI : 10.1109/ICASSP.2014.6854975

. Stemmer, Comparison and Combination of Confidence Measures, Text, Speech and Dialogue, pp.181-188, 2002.
DOI : 10.1007/3-540-46154-X_25

. Stolcke, Explicit word error minimization in n-best list rescoring, Eurospeech, pp.163-166, 1997.

. Stoyanchev, Localized detection of speech recognition errors, 2012 IEEE Spoken Language Technology Workshop (SLT), pp.2012-2037, 2012.
DOI : 10.1109/SLT.2012.6424164

. Stoyanchev, Localized detection of speech recognition errors, 2012 IEEE Spoken Language Technology Workshop (SLT), pp.25-30, 2012.
DOI : 10.1109/SLT.2012.6424164

. Sutskever, On the importance of initialization and momentum in deep learning, ICML, vol.28, issue.3, pp.1139-1147, 2013.

. Tam, ASR error detection using recurrent neural network language model and complementary ASR, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2312-2316, 2014.
DOI : 10.1109/ICASSP.2014.6854012

. Tang, Sentiment Embeddings with Applications to Sentiment Analysis, IEEE Transactions on Knowledge and Data Engineering, vol.28, issue.2, pp.496-509, 2016.
DOI : 10.1109/TKDE.2015.2489653

E. Thibodeau-laufer, Algorithmes d'apprentissage profonds supervisés et non-supervisés : applications et résultats théoriques, 2014.

. Vincent, Stacked denoising autoencoders : Learning useful representations in a deep network with a local denoising criterion, Journal of Machine Learning Research, vol.11, pp.3371-3408, 2010.

. Walker, Sphinx-4 : A flexible open source framework for speech recognition, 2004.

. Wang, Learning Fine-Grained Image Similarity with Deep Ranking, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.1386-1393, 2014.
DOI : 10.1109/CVPR.2014.180

. Weintraub, Neural-network based measures of confidence for word recognition. hypothesis (according to our definition, p.3, 1997.

F. Weiss, Y. Weiss, and W. T. Et-freeman, Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology, Neural Computation, vol.13, issue.10, pp.2173-2200, 2001.
DOI : 10.1109/18.910585

. Wessel, A comparison of word graph and n-best list based confidence measures, EuroSpeech, 1999.

. Wessel, Confidence measures for large vocabulary continuous speech recognition, IEEE Transactions on Speech and Audio Processing, vol.9, issue.3, pp.288-298, 2001.
DOI : 10.1109/89.906002

. Weston, Wsabie : Scaling up to large vocabulary image annotation, IJCAI, pp.2764-2770, 2011.

. Woodland, Improvements in accuracy and speed in the htk broadcast news transcription system, EUROS- PEECH, 1999.

. Yao, Spoken language understanding using long short-term memory neural networks, 2014 IEEE Spoken Language Technology Workshop (SLT), pp.2014-189, 2014.
DOI : 10.1109/SLT.2014.7078572

K. Yao and G. Et-zweig, Sequence-to-sequence neural net models for grapheme-to-phoneme conversion. arXiv preprint, 2015.

. Yik-cheung, ASR error detection using recurrent neural network language model and complementary ASR, Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pp.2312-2316, 2014.

S. J. Young and P. C. Et-woodland, State clustering in hidden Markov model-based continuous speech recognition, Computer Speech & Language, vol.8, issue.4, pp.369-383, 1994.
DOI : 10.1006/csla.1994.1019