J. Lei-ba, J. R. Kiros, and G. E. Hinton, Layer normalization. arXiv preprint, 2016.

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, 2014.

F. Burlot and . Franã?-gois-yvon, Evaluating the morphological competence of Machine Translation Systems, Proceedings of the Second Conference on Machine Translation, 2017.
DOI : 10.18653/v1/W17-4705
URL : https://hal.archives-ouvertes.fr/hal-01618387

O. Caglayan, M. García-martínez, and A. Bardet, Walid Aransa, Fethi Bougares, and 9 http://m2cr.univ-lemans.fr Loïc Barrault Nmtpy: A flexible toolkit for advanced neural machine translation systems. arXiv preprint, 2017.

J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014.

O. Firat and K. Cho, Conditional gated recurrent unit with attention mechanism, 2016.

M. García-martínez, L. Barrault, and F. Bougares, Factored neural machine translation architectures, Proceedings of the International Workshop on Spoken Language Translation, p.16, 2016.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. PMLR, volume 9 of Proceedings of Machine Learning Research, pp.249-256, 2010.

H. Inan, K. Khosravi, and R. Socher, Tying word vectors and word classifiers: A loss framework for language modeling, 2016.

D. Kingma and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412, 2014.

P. Koehn, H. Hoang, A. Birch, C. Callison-burch, M. Federico et al., Moses, Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL '07, pp.177-180, 2007.
DOI : 10.3115/1557769.1557821

T. Mikolov, M. Karafiát, L. Burget-cernocký, and S. Khudanpur, Recurrent neural network based language model, INTERSPEECH 2010 11th Annual Conference of the International Speech Communication Association, pp.1045-1048, 2010.

P. Eteris-paikens, L. Rituma, and L. Pretkalnin, Morphological analysis with limited resources: Latvian example, Proceedings of the 19th Nordic Conference of Computational Linguistics, pp.267-277, 2013.

K. Papineni, S. Roukos, T. Ward, and W. Zhu, BLEU, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , ACL '02, pp.311-318, 2002.
DOI : 10.3115/1073083.1073135

R. Pascanu, T. Mikolov, and Y. Bengio, On the difficulty of training recurrent neural networks, Proceedings of the 30th International Conference on International Conference on Machine Learning -Volume 28. JMLR.org, ICML'13, pp.1310-1318, 2013.

O. Press and L. Wolf, Using the Output Embedding to Improve Language Models, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp.157-163, 2017.
DOI : 10.18653/v1/E17-2025

A. Rousseau, Abstract, The Prague Bulletin of Mathematical Linguistics, vol.100, issue.100, pp.73-82, 2013.
DOI : 10.2478/pralin-2013-0013
URL : https://hal.archives-ouvertes.fr/hal-01353496

H. Schwenk, Continuous-space language models for statistical machine translation, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01433882

R. Sennrich, O. Firat, K. Cho, A. Birch, B. Haddow et al., Nematus: a Toolkit for Neural Machine Translation, Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp.65-68, 2017.
DOI : 10.18653/v1/E17-3017

R. Sennrich, B. Haddow, and A. Birch, Improving Neural Machine Translation Models with Monolingual Data, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.86-96, 2016.
DOI : 10.18653/v1/P16-1009

R. Sennrich, B. Haddow, and A. Birch, Neural Machine Translation of Rare Words with Subword Units, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.1715-1725, 2016.
DOI : 10.18653/v1/P16-1162

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfit- ting, 2014.

J. Straková, M. Straka, and J. Haji?, Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition, Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp.13-18, 2014.
DOI : 10.3115/v1/P14-5003

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, Proceedings of the 27th International Conference on Neural Information Processing Systems, pp.3104-3112, 2014.

F. Vanden-berghen and H. Bersini, CONDOR, a new parallel, constrained extension of Powell's UOBYQA algorithm: Experimental results and comparison with the DFO algorithm, Journal of Computational and Applied Mathematics, vol.181, issue.1, pp.157-175, 2005.
DOI : 10.1016/j.cam.2004.11.029