× Description Download Publication(s) Contact
 Back to Software and Resources

DiscEvalMT

Contrastive test sets for the evaluation of discourse phenomena in English-to-French machine translation

Download from Github

Description

For machine translation (MT) to tackle discourse phenomena, models must be able to handle extra-sentential linguistic context. There has been recent interest in modelling context in neural machine translation (NMT), but models have been principally evaluated with standard automatic metrics, poorly adapted to evaluating discourse phenomena. These test sets provide an alternative way of evaluating NMT models for several discourse phenomena relevant to English-to-French translation. They are contrastive test sets, containing for each source sentence a correct and incorrect translation, which are to be ranked by NMT models: the models are scored on their ability to rank the correct translations higher than the incorrect translations.

The particularity of these test sets is that the correctness of the translations depends entirely on linguistic context in the previous sentence, which must be used by the models in order to score highly. The test sets are balanced such that a baseline model that does not take context into account necessarily scores an accuracy of 50%. The sets test two types of phenomena: (i) anaphora and (ii) lexical choice, including both lexical cohesion and coherence. All examples (200 for each phenomena type) are hand-crafted examples inspired by similar examples in the parallel corpus OpenSubtitles2016 (in terms of vocabulary usage, style and syntactic formulation).

Download

You can download it from the github repository here!

DiscEvalMT is distributed undera CC-BY-SA-4.0 licence.

Citation and publication(s)

If you use this work, please cite the following:

Evaluating Discourse Phenomena in Neural Machine Translation

Rachel Bawden, Rico Sennrich, Alexandra Birch and Barry Haddow. 2018. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pages 1304-1313. New Orleans, United States.
HAL PDF
@inproceedings{bawden_Evaluating-Discourse-Phenomena-in_2018,
  author = {Bawden, Rachel and Sennrich, Rico and Birch, Alexandra and Haddow, Barry},
  title = {{Evaluating Discourse Phenomena in Neural Machine Translation}},
  year = {2018}
  booktitle = {Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language
  Technologies},
  address = {New Orleans, United States},
  pages = {1304--1313},
  url = {https://hal.archives-ouvertes.fr/hal-01800739},
  pdf = {https://hal.archives-ouvertes.fr/hal-01800739/file/contextNMT.pdf},
}

Contact

For more information or if you have any questions, please contact Rachel Bawden

rachel.bawden[at]inria.fr