ANTLR 2.7.2 Release Notes
January 19, 2003
The ANTLR 2.7.2 release is a feature enhancement and bug fix release, partially brought to you by those hip cats at jGuru.com. It has been about 2 years since the last release so expect lots of stuff to have been fixed and improved.
Enhancements
ANTLR 2.7.2 has a few enhancements:
- Added Oliver Zeigermann [oliver@zeigermann.de]'s rudimentary XML lexer (iterates through tags) to examples/java/xml directory.
- Added Marco Ladermann [Marco.Ladermann@gmx.de]'s jedit ANTLR mode to extras dir.
- ANTLR input itself was restricted previously to latin characters (\3..\177), but I have modified the antlr.g and other grammar files to include \3..\377. This will allow many Europeans to add accented characters (iso-8859-15 characters) to their ANTLR actions and rule names.
- Added classHeaderPrefix for all grammars. (replaces "public" and lets user specify)
class TP extends TreeParser;
options {
classHeaderPrefix="public abstract";
}
-
Brian Smith's fix to MismatchedCharException handles EOF properly.
I augmented to handle other special char like \n shows up as '\n' now not ' followed by ' on a newline. Examples:
$ java Main < /tmp/f
exception: line 1:7: expecting '
', found <EOF>
$ java Main < /tmp/f
exception: line 1:7: expecting '\n', found <EOF>
- Added limited hoisting of predicates for lexer rules with one alternative. Any predicate on the left edge is hoisted into the nextToken prediction mechanism to gate predicated rules in and out. Documentation talks about how to match tokens only in column one and other context-sensitive situations.
-
Made $FIRST/$FOLLOW work for action or exception handler. Arg can be rule name
a : A {$FIRST(a); $FIRST} B
exception
catch [MyExc e] {
foo = $FOLLOW(a);
foo = $FIRST(c);
}
;
can do $FIRST(a).member(LBRACK) etc...
-
Added AST.getNumberOfChildren() and to BaseAST. [POSSIBLE INCOMPATIBILITY as you might have implemented AST]
-
added setter/getter for filename in Token
-
sem pred hoisting (limited) for lexer rules
-
ANTLR used to generate a crappy error message:
warning: found optional path in nextToken()
when a public lexer rule could be optional such as
B : ('b')? ;
I now say:
warning: public lexical rule B is optional (can match "nothing")
-
For unexpected and no viable alt exceptions, no file/line/column info was generated.
The exception was wrapped in a TokenStreamRecognitionException that did not delegate error message
handling (toString()) to the wrapped exception.
I added a TokenStreamRecognitionException.toString() method so that you now see things like
:1:1: unexpected char: '_'
instead of
antlr.TokenStreamRecognitionException: unexpected char: '_'
-
Ric added code to not write output files if they have not changed (Java and C++).
-
Added default tab char handling; defaults to 8. Added methods to CharScanner:
public void setTabSize( int size );
public int getTabSize();
-
reformatted all code minus Cpp*.java using Intellij IDEA.
-
Made doEverything return a code instead of System.exit. made a wrapper
to do the exit for cmd-line tools.
-
Made antlr use buffered IO for reading grammars; for java grammar went from 11 seconds to 6 seconds.
-
added reset() to ParserSharedInputState and Lexer too. init of CharQueue
is now public.
-
New Makefile setup. Everything can be build from toplevel up.
Changes for autoconf enabled make. Added install rules, toplevel
configure script.
-
Renamed $lookaheadSet to $FOLLOW. Added %FOLLOW to sather mode (hope it
works) original patch by Ernest Passour
-
Clarified some error/warning messages.
-
Ported action.g fixes from C++ to Java action.g. Warnings and errors
in actions are now correctly reported in java mode as well.
-
Added reset methods to the Queue type objects (java/C++).
-
Added reset methods to the Input/TokenBuffer objects (java/C++).
-
Added reset methods to the SharedInputState objects (java/C++).
-
Added -h/-help/--help options. Adapted the year range in the copyright in
the help message.
-
Allow whitespace between $setxxx and following '('.
-
Give errors when target file/directory not writeable.
-
Changed the position of the init actions for
(..)* (..)+ to just inside the loop handling the closure. This way we
can check EOF conditions in the init action for each loop invocation. And
generate much better error messages. [POSSIBLE INCOMPATIBILITY]
Java Code Generation
C++ Code Generation
- Mirrored tabsize handling from java.
- README updates
- Configure/Makefile changes from David Scott Page
distclean targets + general cleanups etc... (see his mail)
- Removed sstream dependencies from ASTFactory
- Ported change 625 to C++ mode. (currentAST bug)
- Fixed a Makefile for sather removal.
- Fixed:In the command-line options, the docs say to use "-traceTreeWalker".
Alas, the code insists that you use "-traceTreeParser". That was
annoying to figure out. :)
- Tested with 'Sun WorkShop 6 2000/08/30 C++ 5.1 Patch 109490-01'. A few
small fixes.
- Verified build with gcc 2.8.1 and gcc 2.95.3.
- Fixed typo in config.hpp added fixes for 2.8.1.
- Dropped dependency on sstream from ASTFactory
- Misc fixes for 2.8.1
- Added config for Digital Tru64 C++ compiler. (courtesy Andre Moll)
- MetroWerks Codewarrior fixes from Ruslan Zasukhin.
- Define ANTLR_CCTYPE_NEEDS_STD if isprint needs std:: (RZ)
- Define ANTLR_CXX_SUPPORTS_UNCAUGHT_EXCEPTION if std::uncaught_exception
is supported by compiler. (RZ)
- Made XML support configurable with ANTLR_SUPPORT_XML define. (RZ)
- Moved some methods back to header for better inlining. (RZ)
- Added getASTFactory to treeparser. Marked setASTNodeFactory as
deprecated, added setASTFactory to Parser (improve consistency).
- Removed down and right initializers from BaseAST copy constructor, they
wreak havoc in relation to dupTree. (forgot who reported this)
- Added missing initializer for factory in TreeParser constructor.
- Added the possiblity to escape # characters. Added more preprocessor stuff
to be skipped. Changed error for ## into a warning.
- Some heterogeneous AST fixes.
- Made optimization of AST declarations constructions a little bit less
aggressive.
- Tightened up the generation of declarations for AST's.
- Updated a lot of #include "antlr/xx" to #include . Also
- Small addition for MSVC. (Jean-Daniel Fekete)
- Fixed missing 0 check in astfactory code.
- Also preprocess preheader actions and preambles for treegeneration code.
- Added to the C++ LexerSharedInputState an initialize function that
reinitializes the thing with a new stream.
- Bugfix: Initialized attribute filename a little bit earlier so error
message shows the filename in stead of 'null'.
- tokenNames vector is now a simple array not a vector.
- Optimizations in Tracer classes (dumped string's). Removed setTokenNames
from the support library. Switched tokenNames to use a char* array.
- Generate NUM_TOKENS attribute in parsers. Added getNumTokens methods to
parsers.
- Changes in MismatchedTokenException to reflect the previous.
- More fixes for XML I/O (xml-ish actually). It's a bit tidier now. Some
too advanced things removed (ios_base::failure). Embedding custom XML
elements in the stream should be possible now.
- Bugfix: in case of a certain order of header actions (pre_include_xx etc.)
one header action might overwrite another. Probably only affects C++.
- Fix from Emir Uner for KAI C++ cast string literal to 'const
char*' for make_pair.
- Improved exception handling in trace routines of parser. Patch submitted
by John Fremlin. Tracer class now catch exceptions from lexer. Fixed
forgotten message in BitSet.cpp.
- Added implementations for getLAChars and getMarkedChars.
C# Code Generation
C# code generation added by Micheal Jordan, Kunle Odutola and Anthony
Oguntimehin
- Based initial C# generator and runtime model on Java to aid code reuse/portability
- Added support for specifying an enclosing C# namespace for generated lexers/parsers/treeparsers
- Patch from Scott Ellis to optimize _saveIndex variable creation (eliminates related unused-varaible warnings)
- Incorporated Richard Ney's fixes for case-sensitive literals handling, TreeParser token-types classname and "unreachable code" warnings
- Added code to support better handling of C# preprocessor directives in action blocks
- Extensive reamp of heterogenous AST handling to match description in manual
- Added initializeASTFactory(ASTFactory f) method to generated Parsers to facilitate flexible TreeParser factory initialization
- Changed a few more member names in the ongoing quest for full CLS-compliance for the ANTLR C# runtime assembly - xx_tokenSet_xx
- Generated C# lexers/parsers/treeparsers now support tracing if built with the -traceXXXX options
- BREAKING CHANGE: initializeASTFactory(ASTFactory f) is now a static member
- ANTLR C# now includes more than twice as many examples as during the alpha/beta programmes - all examples supplied with build-and-run NAnt build
- ASTFactory.dup(AST t) doesn't use object.Clone() and copy constructors any more. It now uses reflection and interrogate the parameter instance and create a new instance of it's type.
- Support for heterogenous AST greatly improved after receiving detailed bug reports and repro-grammars from Daniel Gackle on the ANTLR list.
Bug Fixes
-
Removed imports from default package in Main.java examples.
- Fixed k=0 value causing exception.
- Ambig refs to ast variables caused a NullPointerException. Now it says:
class ErrorMaker extends TreeParser;
root
: #( WHATEVER SEMI {echo(#SEMI); } SEMI {echo(#SEMI); } )
;
error: Ambiguous reference to AST element SEMI in rule root
Thanks to "Oleg Pavliv"
- From: steve hurt
The second bug occurs when a user wants to
organize a suite of grammar files into seperate
directories. Due to a bug in the
tool it incorrectly forms the location of the
import/export vocabulary files. added a trim() to remove extra space.
- "Silvain Piree" gave me versions of Grammar*.java in preproc that used stringbuffers...much faster for inherited grammars.
- "Lloyd Dupont"
java grammar: Was 0..9 not 0..7 in ESC when starting with 4..7
assumed float not double; 3.0 was seen as float
- John Pybus john@pybus.org
sent in a major fix to handle f.g.super(); required rewrite of primary/postfix expression rules.
- put an "if GENAST" gate around import statements for AST types
in normal non-tree parsers.
- Thanks to Marco van Meegen
for his suggestion/help getting ANTLR
into shape for inclusion into eclipse IDE. I took his suggestion and
make the antlr.Tool object a simple variable reference so that multiple
kinds of Tool objects (such as one hooked into Eclipse) could be used
with ANTLR. This required simple changes but over *many* files!
- removed Sather support at the request of the supporter.
- add warning/error. Bad code gen with ^ or ! on tree root
when building trees in a tree walker grammar such as:
expr: #(PLUS^ expr expr)
| i:INT
;
Fortunately, ^ is simply redundant; removing it makes code ok.
Added a warning. Added an error message for ! saying that it
is not implemented.
- bug fix: incorrect code generation for #(. BLORT) in
tree walker grammar. Didn't properly handle the wildcard
root (missing _t==null check).
- bug fix. The lexer generator puts this assignment _after_ inserting
everything into the literals table: caseSensitiveLiterals = false;
Of course it needs to be before since ANTLRHashString depends on
it to calculate the hashCode. Not sure when this got fixed actually.
- Code gen bug fix: "if true {" could be generated sometimes in
the Lexer. I put (...) around an isolated true if it's generated
from JavaCodeGenerator.getLookaheadTestExpression.
- For large numbers of alternatives (>126) combined with syntactic predicates, there was a problem
whereby the syn pred testing code was not there. 2.7.1 introduced this problem. 2.7.2 has it right again.
- Removed syn pred testing gates on ast construction code; returnAST is ignored while in
try block while guessing. So, the tree construction in an invoked rule while guessing has no effect.
No need to test.
- Char ranges with ! on the alternative or range itself did not have the code necessary to delete the matched character from the token text.
- moved strip*(...) methods from Tool to StringUtils; updated mkjar accordingly.
- bug fix: a #(pippo) construct, which isn't allowed, caused a nullptr exception with kaffe.
It shouldn't get an exception. It now shows: "unexpected token: pippo" instead.
- a double ;; in antlr.g action and some stray semis were causing kjc to puke.
- the constructors of antlr/CharQueue.java and antlr/TokenQueue.java didn't check
for int overflow. They try to set queue size to the next higher multiple
of 2, which is not possible for all inputs (Integer.MAX_VALUE == 2^15-1). The
constructor loops forever for some inputs. Checked for huge size requests.
- The CharScanner.rewind(int) method did not rewind the column, just the input state. oops.
It now reads:
public void rewind(int pos) {
inputState.input.rewind(pos);
setColumn(inputState.tokenStartColumn); // ADDED
}
- Added warnings for labeled subrules.
- Robustified action.g - if currentRule = 0 a fitting error message is
printed.
ANTLR Installation
ANTLR comes as a single zip or compressed tar file. Unzipping the file you receive will produce a directory called antlr-2.7.2 with subdirectories antlr, doc, examples, cpp, and examples.cpp. You need to place the antlr-2.7.2 directory in your CLASSPATH environment variable. For example, if you placed antlr-2.7.2 in directory /tools, you need to append
/tools/antlr-2.7.2
to your CLASSPATH or.
\tools\antlr-2.7.2
if you work on Windoze.
References to antlr.* will map to /tools/antlr-2.7.2/antlr/*.class.
You must have at least JDK 1.1 installed properly on your machine. The ASTFrame AST viewer uses Swing 1.1.
JAR FILE
Try using the runtime library antlr.jar file. Place it in your CLASSPATH instead of the antlr-2.7.2 directory. The jar includes all parse-time files needed (this jar includes every .class file associated with ANTLR) You can run the antlr tool itself with the jar and your parsers.
RUNNING ANTLR
ANTLR is a command line tool (although many development environments let you run ANTLR on grammar files from within the environment). The main method within antlr.Tool is the ANTLR entry point.
java antlr.Tool file.g
The command-line option is -diagnostic, which generates a text file for each output parser class that describes the lookahead sets. Note that there are number of options that you can specify at the grammar class and rule level.
Here are the command line arguments:
ANTLR Parser Generator Version 2.7.2rc1 (20021221) 1989-2002 jGuru.com
usage: java antlr.Tool [args] file.g
-o outputDir specify output directory where all output generated.
-glib superGrammar specify location of supergrammar file.
-debug launch the ParseView debugger upon parser invocation.
-html generate a html file from your grammar.
-docbook generate a docbook sgml file from your grammar.
-diagnostic generate a textfile with diagnostics.
-trace have all rules call traceIn/traceOut.
-traceLexer have lexer rules call traceIn/traceOut.
-traceParser have parser rules call traceIn/traceOut.
-traceTreeParser have tree parser rules call traceIn/traceOut.
-h|-help|--help this message
If you have trouble running ANTLR, ensure that you have Java installed correctly and then ensure that you have the appropriate CLASSPATH set.