A. Alexandrescu, Factored neural language models, Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers on XX, NAACL '06, 2006.
DOI : 10.3115/1614049.1614050
URL : http://ssli.ee.washington.edu/people/katrin/Papers/alexand-kirchhoff-hlt06.pdf

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, p.473, 2014.

O. Caglayan, M. García-martínez, A. Bardet, W. Aransa, F. Bougares et al., Nmtpy: A flexible toolkit for advanced neural machine translation systems. arXiv preprint, 2017.
DOI : 10.1515/pralin-2017-0035
URL : https://hal.archives-ouvertes.fr/hal-01647873

M. Cettolo, C. Girardi, and M. Federico, Wit 3 : Web inventory of transcribed and translated talks, Proceedings of the 16 th Conference of the European Association for Machine Translation (EAMT), pp.261-268, 2012.

K. Cho, B. Van-merrienboer, C. ¸. Gülçehre, F. Bougares, H. Schwenk et al., Learning Phrase Representations using RNN Encoder???Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), p.1078, 2014.
DOI : 10.3115/v1/D14-1179
URL : https://hal.archives-ouvertes.fr/hal-01433235

J. Chung, K. Cho, and Y. Bengio, A Character-level Decoder without Explicit Segmentation for Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), p.6147, 2016.
DOI : 10.18653/v1/P16-1160
URL : http://arxiv.org/pdf/1603.06147

M. R. Costa-jussà and J. A. Fonollosa, Character-based Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), p.810, 2016.
DOI : 10.18653/v1/P16-2058

O. Firat and K. Cho, Conditional gated recurrent unit with attention mechanism, 2016.

M. García-martínez, L. Barrault, and F. Bougares, Factored neural machine translation architectures, Proceedings of the International Workshop on Spoken Language Translation. IWSLT'16, 2016.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS'10). Society for Artificial Intelligence and Statistics, 2010.

S. Jean, K. Cho, R. Memisevic, and Y. Bengio, On Using Very Large Target Vocabulary for Neural Machine Translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1412.
DOI : 10.3115/v1/P15-1001
URL : http://arxiv.org/pdf/1412.2007

P. Koehn, H. Hoang, A. Birch, C. Callison-burch, M. Federico et al., Moses, Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL '07, pp.177-18007, 2007.
DOI : 10.3115/1557769.1557821

A. Lavie and A. Agarwal, Meteor, Proceedings of the Second Workshop on Statistical Machine Translation, StatMT '07, 2007.
DOI : 10.3115/1626355.1626389

H. S. Le, I. Oparin, A. Messaoudi, A. Allauzen, J. L. Gauvain et al., Large vocabulary SOUL neural network language models, sources/ Le11large.pdf, p.INTERSPEECH, 2011.

W. Ling, I. Trancoso, C. Dyer, and A. W. Black, Character-based neural machine translation, p.4586, 2015.

T. Luong, I. Sutskever, Q. V. Le, O. Vinyals, and W. Zaremba, Addressing the Rare Word Problem in Neural Machine Translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), p.8206, 1410.
DOI : 10.3115/v1/P15-1002

A. Nasr, F. Béchet, J. F. Rey, B. Favre, and J. L. Roux, Macaon, an nlp tool suite for processing word lattices, Proceedings of the ACL-HLT 2011 System Demonstrations, pp.86-91, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00702442

J. Niehues, T. L. Ha, E. Cho, and A. Waibel, Using Factored Word Representation in Neural Network Language Models, Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers, pp.74-82, 2016.
DOI : 10.18653/v1/W16-2208

K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, BLEU, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , ACL '02, pp.311-318, 2002.
DOI : 10.3115/1073083.1073135

R. Pascanu, T. Mikolov, and Y. Bengio, Understanding the exploding gradient problem, p.5063, 2012.

A. Rousseau, Abstract, The Prague Bulletin of Mathematical Linguistics, vol.100, pp.73-82, 2013.
DOI : 10.2478/pralin-2013-0013
URL : https://hal.archives-ouvertes.fr/hal-01353496

R. Sennrich, How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, p.4629, 2016.
DOI : 10.18653/v1/E17-2060
URL : https://doi.org/10.18653/v1/e17-2060

R. Sennrich and B. Haddow, Linguistic Input Features Improve Neural Machine Translation, Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers, p.2892, 2016.
DOI : 10.18653/v1/W16-2209
URL : http://arxiv.org/pdf/1606.02892

R. Sennrich, B. Haddow, and A. Birch, Neural Machine Translation of Rare Words with Subword Units, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.1715-1725, 2016.
DOI : 10.18653/v1/P16-1162

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, p.3215, 2014.

Y. Wu, H. Yamamoto, X. Lu, S. Matsuda, C. Hori et al., Factored recurrent neural network language model in ted lecture transcription, p.IWSLT, 2012.

M. D. Zeiler, Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212, p.5701, 2012.