100 6.2.1 Performance des embeddings acoustiques ,
102 6.2.2.1 Descripteurs prosodiques ,
127 8.2.1 Module de compréhension de la parole d'un système de dialogue128 8.2.1.1 Architecture, Descripteurs, p.130 ,
2014] ont proposé d'améliorer la qualité des transcriptions pour de nombreuses tâches telles que la recherche de mots clés, la compréhension de la parole et d'autres tâches nécessitant la post-édition des sorties du SRAP. D'autres études se sont intéressées à l'utilisation des réseaux de confusion (CN) issus du SRAP pour réduire le taux d'erreurs mots et calculer une mesure de confiance, 2000. ,
2015] proposent une approche pour corriger d'une manière automatique des mots erronés dans les réseaux de confusions Celle-ci est fondée sur l'utilisation de descripteurs contextuels et de la distance Normalized Relevance Distance " comme une mesure de similarité sémantique entre les mots situés loin les uns des autres. Les réseaux de confusion peuvent également être utilisés pour améliorer le posttraitement des sorties du SRAP. Par exemple, ils peuvent être utilisés pour proposer d'autres hypothèses quand les transcriptions automatiques sont corrigées par un humain Cependant, les cohortes ou bins d'un réseau de confusion, i.e. ensembles des mots concurrents entre deux noeuds d'un CN, n'ont pas une taille fixe et ne contiennent parfois qu'un ou deux mots. Ceci a pour effet de réduire la possibilité d'aider un annotateur humain à corriger un mot mal reconnu puisque le nombre de mots-hypothèses alternatifs est très faible. Nous proposons d'utiliser à la fois des embeddings linguistiques et acoustiques pour enrichir a posteriori les réseaux de confusion, afin d'améliorer le post-traitement des sorties du SRAP. Pour cela, nous proposons d'enrichir les réseaux de confusion en ajoutant pour chaque mot reconnu les mots de sa liste de confusion jusqu'à obtenir pour chaque cohorte une même taille fixe, Ils sont utilisés pour représenter un ensemble de phrases alternatives et s'appuient sur les probabilités a posteriori. Les auteurs dans Pour ces expériences nous avons utilisé la liste des substitutions Sub T est ainsi que les réseaux de confusion (cohortes) correspondants produit par le SRAP LIUM. La figure 8.5 illustre le pourcentage de cohortes dans ces CNs en fonction du nombre de mots alternatifs (i.e mots en concurrence avec la meilleure hypothèse (1-best)). Les cohortes qui ont une taille entre 6 et 12 sont regroupées en une seule classe, pp.6-12, 2011. ,
avec ses voisins les plus proches dans la liste List h SimInter jusqu'à obtenir une taille de cohorte égale à 6 (si la cohorte contenait déjà au moins 6 mots alternatifs à h, elle n'est pas enrichie) Cette taille semble pertinente pour visualiser des mots alternatifs dans une interface graphique 8.3. Prédiction d'erreurs et enrichissement de réseaux de confusion 141 Automatic error region detection and characterization in LVCSR transcriptions of TV news shows, Acoustics, Speech and Signal Processing (ICASSP), 2012. ,
Characterizing and detecting spontaneous speech: Application to speaker role recognition, Speech Communication, vol.56, pp.1-18, 2014. ,
DOI : 10.1016/j.specom.2013.07.007
URL : https://hal.archives-ouvertes.fr/hal-01433222
Latent semantic analysis. Annual review of information science and technology, pp.188-230, 2004. ,
Finding Structure in Time, Cognitive Science, vol.49, issue.2, pp.179-211, 1990. ,
DOI : 10.1007/BF00308682
Why does unsupervised pre-training help deep learning, Journal of Machine Learning Research, vol.11, issue.Feb, pp.625-660, 2010. ,
The difficulty of training deep architectures and the effect of unsupervised pre-training, AISTATS, pp.153-160, 2009. ,
The EPAC Corpus : Manual and Automatic Annotations of Conversational Speech in French Broadcast News, LREC, Malta, pp.17-23, 2010. ,
Integration of Word and Semantic Features for Theme Identification in Telephone Conversations, 6th International Workshop on Spoken Dialog Systems, 2015. ,
DOI : 10.1109/SLT.2010.5700883
Acoustic and word lattice based algorithms for confidence scores, INTER- SPEECH, 2002. ,
Community Evaluation and Exchange of Word Vectors at wordvectors.org, Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014. ,
DOI : 10.3115/v1/P14-5004
Placing search in context, Proceedings of the tenth international conference on World Wide Web , WWW '01, pp.406-414, 2001. ,
DOI : 10.1145/371920.372094
Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint, 2016. ,
The Rich Transcription, Speech-To-Text (STT) and Speaker Attributed STT (SASTT) Results, 2009. ,
Results of the 2006 spoken term detection evaluation, Proc. SIGIR, pp.51-57, 2007. ,
Experiments with a new boosting algorithm, ICML, pp.148-156, 1996. ,
Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, vol.40, issue.4, pp.193-202, 1980. ,
DOI : 10.1007/BF00344251
Word-error correction of continuous speech recognition based on normalized relevance distance, IJCAI, pp.1257-1262, 2015. ,
Cluster adaptive training of hidden Markov models, IEEE transactions on speech and audio processing, pp.417-428, 2000. ,
DOI : 10.1109/89.848223
The first official repere evaluation, SLAM@ INTERSPEECH, pp.43-48, 2013. ,
Corpus description of the Ester evaluation campaign for the rich transcription of French broadcast news, 5th international Conference on Language Resources and Evaluation (LREC), pp.315-320, 2006. ,
The ESTER phase II evaluation campaign for the rich transcription of French Broadcast News, Interspeech, pp.1149-1152, 2005. ,
The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts, Interspeech, pp.2583-2586, 2009. ,
WordRep : A Benchmark for Research on Learning Word Representations Detecting trends using spearman's rank correlation coefficient, Environmental forensics, pp.359-362, 2001. ,
Transcription de la parole conversationnelle, Traitement Automatique des Langues, issue.3, pp.4535-4582, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-01434260
Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE transactions on speech and audio processing, pp.291-298, 1994. ,
DOI : 10.1109/89.279278
Recognition confidence scoring and its use in speech understanding systems, Computer Speech & Language, vol.16, issue.1, pp.49-67, 2002. ,
DOI : 10.1006/csla.2001.0183
Perceptual Linear Predictive (PLP) Analysis-Resynthesis Technique, Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, pp.0-37, 1991. ,
DOI : 10.1109/ASPAA.1991.634094
Compensating for Word Posterior Estimation Bias in Confusion Networks, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, p.I?I, 2006. ,
DOI : 10.1109/ICASSP.2006.1660230
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.2982-97, 2012. ,
DOI : 10.1109/MSP.2012.2205597
Relaxation and its role in vision, 1978. ,
A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006. ,
DOI : 10.1162/jmlr.2003.4.7-8.1235
Autoencoders, minimum description length, and helmholtz free energy Advances in neural information processing systems, pp.3-3, 1994. ,
Prosodic and other cues to speech recognition failures, Speech Communication, vol.43, issue.1-2, pp.155-175, 2004. ,
DOI : 10.1016/j.specom.2004.01.006
Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997. ,
DOI : 10.1016/0893-6080(88)90007-X
Neural networks and physical systems with emergent collective computational abilities, Proceedings of the national academy of sciences, pp.792554-2558, 1982. ,
Neurons with graded response have collective computational properties like those of two-state neurons., Proceedings of the national academy of sciences, pp.813088-3092, 1984. ,
DOI : 10.1073/pnas.81.10.3088
OntoNotes, Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers on XX, NAACL '06, pp.57-60, 2006. ,
DOI : 10.3115/1614049.1614064
Receptive fields, binocular interaction and functional architecture in the cat's visual cortex, The Journal of Physiology, vol.160, issue.1, pp.106-154, 1962. ,
DOI : 10.1113/jphysiol.1962.sp006837
Deep learning. Book in preparation for, 2016. ,
Deep learning. Book in preparation for, 2016. ,
Batch normalization : Accelerating deep network training by reducing internal covariate shift. arXiv preprint, 2015. ,
Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition, INTERSPEECH. Citeseer, 2012. ,
Stacked auto-encoder for ASR error detection and word error rate prediction, INTER- SPEECH 2015, 16th Annual Conference of the International Speech Communication Association, pp.2142-2146, 2015. ,
Continuous speech recognition by statistical methods, Proceedings of the IEEE, pp.532-556, 1976. ,
DOI : 10.1109/PROC.1976.10159
Continuous speech recognition by statistical methods, Proceedings of the IEEE, pp.532-556, 1976. ,
DOI : 10.1109/PROC.1976.10159
Perplexity???a measure of the difficulty of speech recognition tasks, The Journal of the Acoustical Society of America, vol.62, issue.S1, pp.62-63, 1977. ,
DOI : 10.1121/1.2016299
Confidence measures for speech recognition: A survey, Speech Communication, vol.45, issue.4, pp.455-470, 2005. ,
DOI : 10.1016/j.specom.2004.12.004
Serial order : A parallel distributed processing approach Advances in psychology, pp.471-495, 1997. ,
Exploiting resources from closely-related languages for automatic speech recognition in low-resource languages from Malaysia, Thèse de doctorat, 2015. ,
URL : https://hal.archives-ouvertes.fr/tel-01314120
Deep convolutional acoustic word embeddings using word-pair side information, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015. ,
DOI : 10.1109/ICASSP.2016.7472619
Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.35, issue.3, pp.400-401, 1987. ,
DOI : 10.1109/TASSP.1987.1165125
Likelihood ratio decoding and confidence measures for continuous speech recognition, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, pp.478-481, 1996. ,
DOI : 10.1109/ICSLP.1996.607158
Better word representations with recursive neural networks for morphology, 2013. ,
Word-level acoustic modeling with convolutional vector regression, ICML Workshop on Representation Learning, 2012. ,
Visualizing data using t-sne, Journal of Machine Learning Research, vol.9, pp.2579-2605, 2008. ,
Finding consensus in speech recognition: word error minimization and other applications of confusion networks, Computer Speech & Language, vol.14, issue.4, pp.373-400, 2000. ,
DOI : 10.1006/csla.2000.0152
Building a large annotated corpus of english : The penn treebank, Computational linguistics, vol.19, issue.2, pp.313-330, 1993. ,
DOI : 10.21236/ADA273556
Linear Prediction of Speech, 1982. ,
DOI : 10.1007/978-3-642-66286-7
The lium asr and slt systems for iwslt 2015, 12th International Workshop on Spoken Language Translation, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01433206
Mesures de confiance en traitement automatique de la parole et applications, Thèse de doctora, 2006. ,
Recurrent neural networks : design and applications, 1999. ,
Lium spkdiarization : an open source toolkit for diarization, CMU SPUD Workshop, 2010. ,
Gradient-based learning of higher-order image features, 2011 International Conference on Computer Vision, pp.1591-1598, 2011. ,
DOI : 10.1109/ICCV.2011.6126419
Efficient Estimation of Word Representations in Vector Space, Proceedings of Workshop at ICLR, 2013. ,
Recurrent neural network based language model, Interspeech, p.3, 2010. ,
Extensions of recurrent neural network language model, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5528-5531, 2011. ,
DOI : 10.1109/ICASSP.2011.5947611
Extensions of recurrent neural network language model, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5528-5531, 2011. ,
DOI : 10.1109/ICASSP.2011.5947611
Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, pp.3111-3119, 2013. ,
Linguistic regularities in continuous space word representations, HLT-NAACL, pp.746-751, 2013. ,
Three new graphical models for statistical language modelling, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.641-648, 2007. ,
DOI : 10.1145/1273496.1273577
A fast and simple algorithm for training neural probabilistic language models. arXiv preprint, 2012. ,
Weighted finite-state transducers in speech recognition, Computer Speech & Language, vol.16, issue.1, pp.69-88, 2002. ,
DOI : 10.1006/csla.2001.0184
Confidence measure and incremental adaptation for the rejection of incorrect data, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), pp.1807-1810, 2000. ,
DOI : 10.1109/ICASSP.2000.862105
A boosting approach for confidence scoring, INTERSPEECH, pp.2109-2112, 2001. ,
Hierarchical probabilistic neural network language model, Aistats, pp.246-252, 2005. ,
Loopy belief propagation for approximate inference : An empirical study, Proceedings Recognition (ICPR), 2012 21st International Conference on, pp.3288-3291, 1999. ,
Conceptual decoding from word lattices : application to the spoken dialogue corpus media, The Ninth International Conference on Spoken Language Processing, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-01160181
Exploring the use of attention-based recurrent neural networks for spoken language understanding, NIPS, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01433202
Parsing natural scenes and natural language with recursive neural networks, Proceedings of the 28th international conference on machine learning (ICML-11), pp.129-136, 2011. ,
Rescoring Confusion Networks for Keyword Search, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.7088-7092, 2014. ,
DOI : 10.1109/ICASSP.2014.6854975
Comparison and Combination of Confidence Measures, Text, Speech and Dialogue, pp.181-188, 2002. ,
DOI : 10.1007/3-540-46154-X_25
Explicit word error minimization in n-best list rescoring, Eurospeech, pp.163-166, 1997. ,
Localized detection of speech recognition errors, 2012 IEEE Spoken Language Technology Workshop (SLT), pp.2012-2037, 2012. ,
DOI : 10.1109/SLT.2012.6424164
Localized detection of speech recognition errors, 2012 IEEE Spoken Language Technology Workshop (SLT), pp.25-30, 2012. ,
DOI : 10.1109/SLT.2012.6424164
On the importance of initialization and momentum in deep learning, ICML, vol.28, issue.3, pp.1139-1147, 2013. ,
ASR error detection using recurrent neural network language model and complementary ASR, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2312-2316, 2014. ,
DOI : 10.1109/ICASSP.2014.6854012
Sentiment Embeddings with Applications to Sentiment Analysis, IEEE Transactions on Knowledge and Data Engineering, vol.28, issue.2, pp.496-509, 2016. ,
DOI : 10.1109/TKDE.2015.2489653
Algorithmes d'apprentissage profonds supervisés et non-supervisés : applications et résultats théoriques, 2014. ,
Stacked denoising autoencoders : Learning useful representations in a deep network with a local denoising criterion, Journal of Machine Learning Research, vol.11, pp.3371-3408, 2010. ,
Sphinx-4 : A flexible open source framework for speech recognition, 2004. ,
Learning Fine-Grained Image Similarity with Deep Ranking, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.1386-1393, 2014. ,
DOI : 10.1109/CVPR.2014.180
Neural-network based measures of confidence for word recognition. hypothesis (according to our definition, p.3, 1997. ,
Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology, Neural Computation, vol.13, issue.10, pp.2173-2200, 2001. ,
DOI : 10.1109/18.910585
A comparison of word graph and n-best list based confidence measures, EuroSpeech, 1999. ,
Confidence measures for large vocabulary continuous speech recognition, IEEE Transactions on Speech and Audio Processing, vol.9, issue.3, pp.288-298, 2001. ,
DOI : 10.1109/89.906002
Wsabie : Scaling up to large vocabulary image annotation, IJCAI, pp.2764-2770, 2011. ,
Improvements in accuracy and speed in the htk broadcast news transcription system, EUROS- PEECH, 1999. ,
Spoken language understanding using long short-term memory neural networks, 2014 IEEE Spoken Language Technology Workshop (SLT), pp.2014-189, 2014. ,
DOI : 10.1109/SLT.2014.7078572
Sequence-to-sequence neural net models for grapheme-to-phoneme conversion. arXiv preprint, 2015. ,
ASR error detection using recurrent neural network language model and complementary ASR, Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pp.2312-2316, 2014. ,
State clustering in hidden Markov model-based continuous speech recognition, Computer Speech & Language, vol.8, issue.4, pp.369-383, 1994. ,
DOI : 10.1006/csla.1994.1019