ALMAnaCH, Inria

Description

Language models obtained by training the ELMo architecture on the Bulgarian, Catalan, Danish, Finnish and Indonesian subcorpora of the OSCAR large-coverage multilingual corpus (Ortiz Suárez et al., 2019).

Logo by Alix Chagué.

Citation and publication(s)

If you use this work, please cite the following:

Pedro Javier Ortiz Suárez, Laurent Romary and Benoît Sagot. 2020. A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages.

In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pages 1703–1714. Association for Computational Linguistics. Online.
HAL PDF

@inproceedings{ortiz-suarez-etal-2020-monolingual,
 address = {Online},
 author = {Ortiz Su{\'a}rez, Pedro Javier and Romary, Laurent and Sagot, Beno{\^\i}t},
 title = {A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages},
year = {2020},
 booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
 publisher = {Association for Computational Linguistics},
 pages = {1703--1714},
 doi = {10.18653/v1/2020.acl-main.156},
 url = {https://aclanthology.org/2020.acl-main.156},
 hal_url = {https://hal.inria.fr/hal-02863875},
 hal_pdf = {https://hal.inria.fr/hal-02863875v2/file/ELMos.pdf},
}

Contact

For more information or if you have any questions, please contact Pedro Ortiz Suarez and Benoît Sagot

pedro.ortiz-suarez[at]inria.fr and Benoit.Sagot[at]inria.fr

MRELMo

ELMo language models for 5 mid-resource languages (Bulgarian, Catalan, Danish, Finnish, Indonesian)

Description

Citation and publication(s)

Pedro Javier Ortiz Suárez, Laurent Romary and Benoît Sagot. 2020. A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages.

Contact