D. Bahdanau, K. Cho, and Y. Bengio, Neural Machine Translation by Jointly Learning to Align and Translate

O. Caglayan, W. Aransa, Y. Wang, M. Masana, M. García-martínez et al., Does Multimodality Help Human and Machine for Translation and Image Captioning?, Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp.627-633, 2016.
DOI : 10.18653/v1/W16-2358
URL : https://hal.archives-ouvertes.fr/hal-01433183

O. Caglayan, L. Barrault, and F. Bougares, Multimodal Attention for Neural Machine Translation. arXiv preprint arXiv:1609

O. Caglayan, W. Aransa, A. Bardet, M. García-martínez, F. Bougares et al., LIUM-CVC Submissions for WMT17 Multimodal Translation Task, Proceedings of the Second Conference on Machine Translation, 2017.

X. Chen, H. Fang, T. Lin, R. Vedantam, S. Gupta et al., Microsoft COCO captions: Data collection and evaluation server. arXiv preprint, 2015.

J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

D. Elliott, S. Frank, L. Barrault, F. Bougares, and L. Specia, Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description, Proceedings of the Second Conference on Machine Translation, 2017.

O. Firat and K. Cho, Conditional Gated Recurrent Unit with Attention Mechanism, 2016.

M. García-martínez, L. Barrault, and F. Bougares, Factored Neural Machine Translation Architectures, Proceedings of the International Workshop on Spoken Language Translation , IWSLT'16

M. García-martínez, O. Caglayan, W. Aransa, A. Bardet, F. Bougares et al., LIUM Machine Translation Systems for WMT17 News Translation Task, Proceedings of the Second Conference on Machine Translation, 2017.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics of Proceedings of Machine Learning Research, pp.249-256, 2010.

I. Goodfellow, D. Warde-farley, M. Mirza, A. Courville, and Y. Bengio, Maxout Networks, Proceedings of the 30th International Conference on Machine Learning of Proceedings of Machine NMTPY, pp.15-28

K. He, X. Zhang, S. Ren, and J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, 2015 IEEE International Conference on Computer Vision (ICCV), pp.1026-1034, 2015.
DOI : 10.1109/ICCV.2015.123
URL : http://arxiv.org/pdf/1502.01852

J. Helcl and J. Libovický, Neural Monkey: An Open-source Tool for Sequence Learning. The Prague Bulletin of Mathematical Linguistics, pp.5-17, 2017.
DOI : 10.1515/pralin-2017-0001
URL : https://doi.org/10.1515/pralin-2017-0001

H. Inan, K. Khosravi, and R. Socher, Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling, 2016.

D. Kingma and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412

G. Klein, Y. Kim, Y. Deng, J. Senellart, and A. M. Rush, OpenNMT: Open-Source Toolkit for Neural Machine Translation, Proceedings of ACL 2017, System Demonstrations, 2017.
DOI : 10.18653/v1/P17-4012
URL : http://arxiv.org/pdf/1701.02810

A. Lavie and A. Agarwal, Meteor, Proceedings of the Second Workshop on Statistical Machine Translation, StatMT '07, pp.228-231, 2007.
DOI : 10.3115/1626355.1626389

A. Neelakantan, L. Vilnis, V. Quoc, I. Le, L. Sutskever et al., Adding gradient noise improves learning for very deep networks

K. Papineni, S. Roukos, T. Ward, and W. Zhu, BLEU, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , ACL '02, pp.311-318, 2002.
DOI : 10.3115/1073083.1073135

R. Pascanu, T. Mikolov, and Y. Bengio, On the Difficulty of Training Recurrent Neural Networks, Proceedings of the 30th International Conference on International Conference on Machine Learning, pp.1310-1318

O. Press and L. Wolf, Using the output embedding to improve language models. arXiv preprint, 2016.

A. M. Saxe, L. James, S. Mcclelland, and . Ganguli, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312, 2013.

R. Sennrich and B. Haddow, A Joint Dependency Model of Morphological and Syntactic Structure for Statistical Machine Translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.114-121, 2015.
DOI : 10.18653/v1/D15-1248

R. Sennrich, B. Haddow, and A. Birch, Neural Machine Translation of Rare Words with Subword Units, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.1715-1725, 2016.
DOI : 10.18653/v1/P16-1162

R. Sennrich, O. Firat, K. Cho, A. Birch-mayne, B. Haddow et al., Nematus: a Toolkit for Neural Machine Translation, Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp.65-68
DOI : 10.18653/v1/E17-3017
URL : http://arxiv.org/pdf/1703.04357

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, J. Mach. Learn. Res, vol.15, issue.1, pp.1929-1958, 2014.

T. Tieleman and G. Hinton, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural networks for machine learning, p.2012

K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville et al., Attend and Tell: Neural Image Caption Generation with Visual Attention, Proceedings of the 32nd International Conference on Machine Learning (ICML-15) Conference Proceedings, pp.2048-2057, 2015.

M. D. Zeiler, ADADELTA: an adaptive learning rate method. arXiv preprint, 2012.