× Description Download Publication(s) Contact
 Back to Software and Resources

MRELMo

ELMo language models for 5 mid-resource languages (Bulgarian, Catalan, Danish, Finnish, Indonesian)

Main website

Description

Language models obtained by training the ELMo architecture on the Bulgarian, Catalan, Danish, Finnish and Indonesian subcorpora of the OSCAR large-coverage multilingual corpus (Ortiz Suárez et al., 2019).

Logo by Alix Chagué.

Citation and publication(s)

If you use this work, please cite the following:

Pedro Javier Ortiz Suárez, Laurent Romary and Benoît Sagot. 2020. A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages.
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pages 1703–1714. Association for Computational Linguistics. Online.
HAL PDF
@inproceedings{ortiz-suarez-etal-2020-monolingual,
 address = {Online},
 author = {Ortiz Su{\'a}rez, Pedro Javier and Romary, Laurent and Sagot, Beno{\^\i}t},
 title = {A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages},
year = {2020},
 booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
 publisher = {Association for Computational Linguistics},
 pages = {1703--1714},
 doi = {10.18653/v1/2020.acl-main.156},
 url = {https://aclanthology.org/2020.acl-main.156},
 hal_url = {https://hal.inria.fr/hal-02863875},
 hal_pdf = {https://hal.inria.fr/hal-02863875v2/file/ELMos.pdf},
}

Contact

For more information or if you have any questions, please contact Pedro Ortiz Suarez and Benoît Sagot

pedro.ortiz-suarez[at]inria.fr and Benoit.Sagot[at]inria.fr