× Description Download Publication(s) Contact
 Back to Software and Resources

goclassy

Asynchronous concurrent pipeline for classifying Common Crawl

Main website

Description

An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.

Citation and publication(s)

If you use this work, please cite the following:

Pedro Javier Ortiz Suárez, Benoît Sagot and Laurent Romary. 2019. Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures.
In 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7). Leibniz-Institut für Deutsche Sprache. Cardiff, United Kingdom.
HAL PDF
@inproceedings{ortizsuarez:hal-02148693,
 address = {Cardiff, United Kingdom},
 author = {Ortiz Su{\'a}rez, Pedro Javier and Sagot, Beno{\^i}t and Romary, Laurent},
 title = {{Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures}},
year = {2019},
 booktitle = {{7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7)}},
 publisher = {{Leibniz-Institut f{\"u}r Deutsche Sprache}},
 editor = {Piotr Ba{\'n}ski and Adrien Barbaresi and Hanno Biber and Evelyn Breiteneder and Simon Clematide and Marc Kupietz and Harald L{\"u}ngen and Caroline Iliadi},
 doi = {10.14618/IDS-PUB-9021},
 url = {https://inria.hal.science/hal-02148693},
 hal_pdf = {https://inria.hal.science/hal-02148693v1/file/Asynchronous_Pipeline_for_Processing_Huge_Corpora_on_Medium_to_Low_Resource_Infrastructures.pdf},
}

Contact

For more information or if you have any questions, please contact Pedro Ortiz Suarez

pedro.ortiz-suarez[at]inria.fr