Software and Resources

Navigate using the side menu  

Language models

CamemBERT

CamemBERT

Neural BERT-like language model for French
PAGnol

PAGnol

Neural GPT-based language model for French
FrELMo

FrELMo

ELMo language model for French
MRELMo

MRELMo

ELMo language models for 5 mid-resource languages (Bulgarian, Catalan, Danish, Finnish, Indonesian)

Lexicons

WOLF

WOLF

Free Wordnet for French
OFrLex-modifier

OFrLex-modifier

Alexina

Alexina

Morphological (and sometimes syntactic) lexicons (including the Lefff)
EtymDB

EtymDB

Etymological database extracted from wiktionary
UDLexicons

UDLexicons

Multilingual collection of morphological lexicons

Raw corpora

OSCAR

OSCAR

Huge multilingual web-based corpus
goclassy

goclassy

Asynchronous concurrent pipeline for classifying Common Crawl

Treebanks

FQB

FQB

Multi-layered treebank made of questions for French
Sequoia corpus

Sequoia corpus

French corpus with surface and deep syntactic annotations
FSMB

FSMB

French social media bank

Other annotated corpora

VerDI project release

VerDI project release

Machine translation

DiscEvalMT

DiscEvalMT

Contrastive test sets for the evaluation of discourse phenomena in English-to-French machine translation
PFSMB

PFSMB

FR-EN parallel corpus of noisy user-generated content
DiaBLa

DiaBLa

Parallel dataset of English-French bilingual dialogues

Text simplification

EASSE

EASSE

Text Simplification Evaluation Library
ASSET

ASSET

Text Simplification Evaluation Dataset
tseval

tseval

Text Simplification Evaluation Library

Parsing

FRMG

FRMG

A large-coverage meta-grammar for French
dyalog-sr

dyalog-sr

Transition-based parser built on top of DyALog
DyALog

DyALog

Environment for building tabular parsers and programs
Mgwiki

Mgwiki

Linguistic Wiki for FRMG
SYNTAX

SYNTAX

Lexical and syntactic parser generator
ELMoLex

ELMoLex

Neural parsing system developed for ALMAnaCH's submission to the CoNLL-18 multilingual parsing shared task

Shallow processing and tagging

GROBID

GROBID

Library for extracting, parsing and re-structuring raw documents
GROBID-Dictionaries

GROBID-Dictionaries

GROBID module for structuring digitised lexical resources and entry-based documents
SxPipe

SxPipe

Shallow language pipeline
entity-fishing

entity-fishing

Entity recognition and disambiguation
MElt

MElt

Statistical part-of-speech tagger

Standardisation

SSK (fr) / Standardization Survival Kit (en)

SSK (fr) / Standardization Survival Kit (en)

SSK

SSK

Collection of research use case scenarios illustrating best practices in Digital Humanities and Heritage research

Industrial software

Enqi

Enqi

vera

vera

Automatic analysis of answers to open-ended questions in employee surveys