antlr - ANother Tool for Language Recognition
antlr [
options]
grammar_files
Antlr converts an extended form of context-free grammar into a set of C
functions which directly implement an efficient form of deterministic
recursive-descent LL(k) parser. Context-free grammars may be augmented with
predicates to allow semantics to influence parsing; this allows a form of
context-sensitive parsing. Selective backtracking is also available to handle
non-LL(k) and even non-LALR(k) constructs.
Antlr also produces a
definition of a lexer which can be automatically converted into C code for a
DFA-based lexer by
dlg. Hence,
antlr serves a function much like
that of
yacc, however, it is notably more flexible and is more
integrated with a lexer generator (
antlr directly generates
dlg
code, whereas
yacc and
lex are given independent descriptions).
Unlike
yacc which accepts
LALR(1) grammars,
antlr accepts LL(k)
grammars in an extended BNF notation — which eliminates the need for
precedence rules.
Like
yacc grammars,
antlr grammars can use
automatically-maintained symbol attribute values referenced as dollar
variables. Further, because
antlr generates top-down parsers, arbitrary
values may be inherited from parent rules (passed like function parameters).
Antlr also has a mechanism for creating and manipulating
abstract-syntax-trees.
There are various other niceties in
antlr, including the ability to
spread one grammar over multiple files or even multiple grammars in a single
file, the ability to generate a version of the grammar with actions stripped
out (for documentation purposes), and lots more.
-
-ck n
- Use up to n symbols of lookahead when using
compressed (linear approximation) lookahead. This type of lookahead is
very cheap to compute and is attempted before full LL(k) lookahead, which
is of exponential complexity in the worst case. In general, the compressed
lookahead can be much deeper (e.g, -ck 10) than the full lookahead (which
usually must be less than 4).
- -CC
- Generate C++ output from both ANTLR and DLG.
- -cr
- Generate a cross-reference for all rules. For each rule,
print a list of all other rules that reference it.
- -e1
- Ambiguities/errors shown in low detail (default).
- -e2
- Ambiguities/errors shown in more detail.
- -e3
- Ambiguities/errors shown in excruciating detail.
-
-fe file
- Rename err.c to file.
-
-fh file
- Rename stdpccts.h header (turns on -gh) to
file.
-
-fl file
- Rename lexical output, parser.dlg, to file.
-
-fm file
- Rename file with lexical mode definitions, mode.h,
to file.
-
-fr file
- Rename file which remaps globally visible symbols,
remap.h, to file.
-
-ft file
- Rename tokens.h to file.
- -ga
- Generate ANSI-compatible code (default case). This has not
been rigorously tested to be ANSI XJ11 C compliant, but it is close. The
normal output of antlr is currently compilable under both K&R,
ANSI C, and C++—this option does nothing because antlr
generates a bunch of #ifdef's to do the right thing depending on the
language.
- -gc
- Indicates that antlr should generate no C code,
i.e., only perform analysis on the grammar.
- -gd
- C code is inserted in each of the antlr generated
parsing functions to provide for user-defined handling of a detailed parse
trace. The inserted code consists of calls to the user-supplied macros or
functions called zzTRACEIN and zzTRACEOUT. The only argument
is a char * pointing to a C-style string which is the grammar rule
recognized by the current parsing function. If no definition is given for
the trace functions, upon rule entry and exit, a message will be printed
indicating that a particular rule as been entered or exited.
- -ge
- Generate an error class for each non-terminal.
- -gh
- Generate stdpccts.h for non-ANTLR-generated files to
include. This file contains all defines needed to describe the type of
parser generated by antlr (e.g. how much lookahead is used and
whether or not trees are constructed) and contains the header
action specified by the user.
- -gk
- Generate parsers that delay lookahead fetches until needed.
Without this option, antlr generates parsers which always have
k tokens of lookahead available.
- -gl
- Generate line info about grammar actions in C parser of the
form
# line "file"
which makes error messages from the C/C++ compiler make more sense as they
will point into the grammar file not the resulting C file. Debugging is
easier as well, because you will step through the grammar not C file.
- -gs
- Do not generate sets for token expression lists; instead
generate a ||-separated sequence of
LA(1)==token_number. The default is to generate sets.
- -gt
- Generate code for Abstract-Syntax Trees.
- -gx
- Do not create the lexical analyzer files (dlg-related).
This option should be given when the user wishes to provide a customized
lexical analyzer. It may also be used in make scripts to cause only
the parser to be rebuilt when a change not affecting the lexical structure
is made to the input grammars.
-
-k n
- Set k of LL(k) to n; i.e. set tokens of look-ahead
(default==1).
-
-o dir
- Directory where output files should go
(default="."). This is very nice for keeping the source
directory clear of ANTLR and DLG spawn.
- -p
- The complete grammar, collected from all input grammar
files and stripped of all comments and embedded actions, is listed to
stdout. This is intended to aid in viewing the entire grammar as a
whole and to eliminate the need to keep actions concisely stated so that
the grammar is easier to read. Hence, it is preferable to embed even
complex actions directly in the grammar, rather than to call them as
subroutines, since the subroutine call overhead will be saved.
- -pa
- This option is the same as -p except that the output
is annotated with the first sets determined from grammar analysis.
- -prc on
- Turn on the computation and hoisting of predicate
context.
- -prc off
- Turn off the computation and hoisting of predicate context.
This option makes 1.10 behave like the 1.06 release with option -pr
on. Context computation is off by default.
-
-rl n
- Limit the maximum number of tree nodes used by grammar
analysis to n. Occasionally, antlr is unable to analyze a
grammar submitted by the user. This rare situation can only occur when the
grammar is large and the amount of lookahead is greater than one. A
nonlinear analysis algorithm is used by PCCTS to handle the general case
of LL(k) parsing. The average complexity of analysis, however, is near
linear due to some fancy footwork in the implementation which reduces the
number of calls to the full LL(k) algorithm. An error message will be
displayed, if this limit is reached, which indicates the grammar construct
being analyzed when antlr hit a non-linearity. Use this option if
antlr seems to go out to lunch and your disk start thrashing; try
n=10000 to start. Once the offending construct has been identified,
try to remove the ambiguity that antlr was trying to overcome with
large lookahead analysis. The introduction of (...)? backtracking blocks
eliminates some of these problems — antlr does not
analyze alternatives that begin with (...)? (it simply backtracks, if
necessary, at run time).
- -w1
- Set low warning level. Do not warn if semantic predicates
and/or (...)? blocks are assumed to cover ambiguous alternatives.
- -w2
- Ambiguous parsing decisions yield warnings even if semantic
predicates or (...)? blocks are used. Warn if predicate context computed
and semantic predicates incompletely disambiguate alternative
productions.
- -
- Read grammar from standard input and generate
stdin.c as the parser file.
Antlr works... we think. There is no implicit guarantee of anything. We
reserve no
legal rights to the software known as the Purdue Compiler
Construction Tool Set (PCCTS) — PCCTS is in the public domain. An
individual or company may do whatever they wish with source code distributed
with PCCTS or the code generated by PCCTS, including the incorporation of
PCCTS, or its output, into commercial software. We encourage users to develop
software with PCCTS. However, we do ask that credit is given to us for
developing PCCTS. By "credit", we mean that if you incorporate our
source code into one of your programs (commercial product, research project,
or otherwise) that you acknowledge this fact somewhere in the documentation,
research report, etc... If you like PCCTS and have developed a nice tool with
the output, please mention that you developed it using PCCTS. As long as these
guidelines are followed, we expect to continue enhancing this system and
expect to make other tools available as they are completed.
- *.c
- output C parser.
- *.cpp
- output C++ parser when C++ mode is used.
- parser.dlg
- output dlg lexical analyzer.
- err.c
- token string array, error sets and error support routines.
Not used in C++ mode.
- remap.h
- file that redefines all globally visible parser symbols.
The use of the #parser directive creates this file. Not used in C++
mode.
- stdpccts.h
- list of definitions needed by C files, not generated by
PCCTS, that reference PCCTS objects. This is not generated by default. Not
used in C++ mode.
- tokens.h
- output #defines for tokens used and function
prototypes for functions generated for rules.
dlg(1),
pccts(1)