Language models obtained by training the ELMo architecture on the Bulgarian, Catalan, Danish, Finnish and Indonesian subcorpora of the OSCAR large-coverage multilingual corpus (Ortiz Suárez et al., 2019).
Logo by Alix Chagué.
If you use this work, please cite the following:
@inproceedings{ortiz-suarez-etal-2020-monolingual,
address = {Online},
author = {Ortiz Su{\'a}rez, Pedro Javier and Romary, Laurent and Sagot, Beno{\^\i}t},
title = {A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages},
year = {2020},
booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
publisher = {Association for Computational Linguistics},
pages = {1703--1714},
doi = {10.18653/v1/2020.acl-main.156},
url = {https://aclanthology.org/2020.acl-main.156},
hal_url = {https://hal.inria.fr/hal-02863875},
hal_pdf = {https://hal.inria.fr/hal-02863875v2/file/ELMos.pdf},
}