SYNTAX has been designed and developed at Inria by Pierre Boullier for several decades within the ALMAnaCH, ALPAGE, ATOLL teams, and others before, with the help of Philippe Deschamp and Benoît Sagot, amongst others. SYNTAX is a system that, given a contextual-free grammar, can (i) build the corresponding optimised automaton, and (ii) using a parsing library, execute the resulting automata on source texts. SYNTAX makes it possible to process both deterministic grammars of the LR class and other broader classes and general contextual-free grammars. The deterministic version has been used in industrial environments (Ada, for example). The non-deterministic version is at the heart of various parsers for natural language, including the SxLFG parser, which is based on the LFG (Lexical-Functional Grammar) formalism, and the SxPipe preprocessing chain.
The current version of SYNTAX (version 6.0 beta) also includes parser generators for other formalisms, both context-sensitive formalisms (TAG, RCG) and formalisms based on contextual-free grammars but supplemented with attribute calculations, in particular for natural language processing (LFG formalism).
SYNTAX has been ported to various architectures, including 64-bit architectures, mainly by the former VASY team (Inria Rhône-Alpes), now CONVECS.
SYNTAX is distributed under a free licence: the SYNTAX library and the code produced by SYNTAX are under the CeCILL-C licence (à la LGPL), while the source code of the manufacturers is under the CeCILL licence (à la GPL).
In addition, SYNTAX now has an English wikipedia page and a French wikipedia page.
Historically, SYNTAX's main application was the construction of parsers for programming languages, in the field of compilation (C, Ada, etc.). It was therefore mainly a generator of deterministic context-free analysers, akin to the UNIX programs Lex and Yacc. The strength of SYNTAX as compared to these programs was then threefold:
SYNTAX's version 3.9 is the culmination of research in the field of deterministic analysis. There is a user manual for SYNTAX version 3.5 [Boullier 88]. Most uses of SYNTAX in the field of compilation of programming languages use this technology, in the form of SYNTAX 3.9 (Ada compiler, etc.) or under its evolution in the current version of SYNTAX.
For the past twenty years, SYNTAX has turned to language processing applications. This has led to four types of development:
Versions 4 and 5 are not distributed, while version 3 was available free for research purposes and version 6 is freely available under the CeCILL and CeCILL-C license, depending on the files (see above, and see the LICENSE file included in the package).
SYNTAX is developed almost exclusively in C (only a few recent small extensions use perl). SYNTAX does not directly produce parsers from grammars. It produces a set of data represented in C, which must be compiled and linked with various general modules. These data are tables, i.e. initialised C arrays, and functions.
The core of SYNTAX consists mainly of three types of files:
Finally, SYNTAX is bootstrapped: the compilation of the constructors is achieved using SYNTAX itself (including said constructors).
SYNTAX is hosted on the Inria GForge. Il should be shortly migrated to Inria's Gitlab.
The last packaged version of SYNTAX and previous packaged versions can be downloaded from the download page of the SYNTAX project on the Inria GForge.
For more information or if you have any questions, please contact benoit.sagot[at]inria.fr