ALMAnaCH, Inria

Description

SYNTAX has been designed and developed at Inria by Pierre Boullier for several decades within the ALMAnaCH, ALPAGE, ATOLL teams, and others before, with the help of Philippe Deschamp and Benoît Sagot, amongst others. SYNTAX is a system that, given a contextual-free grammar, can (i) build the corresponding optimised automaton, and (ii) using a parsing library, execute the resulting automata on source texts. SYNTAX makes it possible to process both deterministic grammars of the LR class and other broader classes and general contextual-free grammars. The deterministic version has been used in industrial environments (Ada, for example). The non-deterministic version is at the heart of various parsers for natural language, including the SxLFG parser, which is based on the LFG (Lexical-Functional Grammar) formalism, and the SxPipe preprocessing chain.

The current version of SYNTAX (version 6.0 beta) also includes parser generators for other formalisms, both context-sensitive formalisms (TAG, RCG) and formalisms based on contextual-free grammars but supplemented with attribute calculations, in particular for natural language processing (LFG formalism).

SYNTAX has been ported to various architectures, including 64-bit architectures, mainly by the former VASY team (Inria Rhône-Alpes), now CONVECS.

SYNTAX is distributed under a free licence: the SYNTAX library and the code produced by SYNTAX are under the CeCILL-C licence (à la LGPL), while the source code of the manufacturers is under the CeCILL licence (à la GPL).

In addition, SYNTAX now has an English wikipedia page and a French wikipedia page.

History

Historically, SYNTAX's main application was the construction of parsers for programming languages, in the field of compilation (C, Ada, etc.). It was therefore mainly a generator of deterministic context-free analysers, akin to the UNIX programs Lex and Yacc. The strength of SYNTAX as compared to these programs was then threefold:

a larger variety and a higher expressive power in semantic mechanisms that can be used during of after parsing,
high-performing error recovery mechanisms [Boullier 87],
the generation of compact and high-performance parsers (more efficient than with Yacc).

SYNTAX's version 3.9 is the culmination of research in the field of deterministic analysis. There is a user manual for SYNTAX version 3.5 [Boullier 88]. Most uses of SYNTAX in the field of compilation of programming languages use this technology, in the form of SYNTAX 3.9 (Ada compiler, etc.) or under its evolution in the current version of SYNTAX.

For the past twenty years, SYNTAX has turned to language processing applications. This has led to four types of development:

the ability to process (deterministically) a larger class of contextual-free languages, and in particular RLR languages (a contextual-free language is said to be RLR if, for any conflict in the LR(0) automaton, you can find a regular grammar with which you can parse what remains of the input string until you reache a point where you can decide how to resolve the conflict—so this is a form of unbounded lookahead); this resulted in version 4 of SYNTAX;
non-deterministic contextual-free parsing, with Earley- and GLR-like parser generators; this resulted in SYNTAX version 5;
contextual (non-deterministic) parsing, with an RCG parser generator;
guided parsing (for both contextual-free grammars and RCGs); this and the previous items resulted in SYNTAX version 6.

Versions 4 and 5 are not distributed, while version 3 was available free for research purposes and version 6 is freely available under the CeCILL and CeCILL-C license, depending on the files (see above, and see the LICENSE file included in the package).

Architecture

SYNTAX is developed almost exclusively in C (only a few recent small extensions use perl). SYNTAX does not directly produce parsers from grammars. It produces a set of data represented in C, which must be compiled and linked with various general modules. These data are tables, i.e. initialised C arrays, and functions.

The core of SYNTAX consists mainly of three types of files:

a number of constructors which build from a grammar the tables and functions specific to the corresponding parser, and which define among other things the grammar itself, the data for the construction of the lexer, those for the construction of the parser and those for error recovery;
the modules that contain the generic functions at the heart of each analyser,
a large number of utility modules, which are one of the causes of SYNTAX's efficiency, and which allow a very efficient management of strings, hash tables, arrays, bit vectors, sets, and more.

Finally, SYNTAX is bootstrapped: the compilation of the constructors is achieved using SYNTAX itself (including said constructors).

Download

SYNTAX is hosted on the Inria GForge. Il should be shortly migrated to Inria's Gitlab.

The last packaged version of SYNTAX and previous packaged versions can be downloaded from the download page of the SYNTAX project on the Inria GForge.

Contact

For more information or if you have any questions, please contact benoit.sagot[at]inria.fr

SYNTAX

Lexical and syntactic parser generator

Description

History

Architecture

Download

Contact