org.jacorb.idl

Class lexer

public class lexer extends Object

This class implements a scanner (aka lexical analyzer or lexer) for IDL. The scanner reads characters from a global input stream and returns integers corresponding to the terminal number of the next token. Once the end of input is reached the EOF token is returned on every subsequent call.

All symbol constants are defined in sym.java which is generated by JavaCup from parser.cup.

In addition to the scanner proper (called first via init() then with next_token() to get each token) this class provides simple error and warning routines and keeps a count of errors and warnings that is publicly accessible. It also provides basic preprocessing facilties, i.e. it does handle preprocessor directives such as #define, #undef, #include, etc. although it does not provide full C++ preprocessing This class is "static" (i.e., it has only static members and methods).

Version: $Id: lexer.java,v 1.53 2006/10/13 19:56:48 andre.spiegel Exp $

Author: Gerald Brose

Field Summary
protected static Hashtablechar_symbols
Table of single character symbols.
protected static booleanconditionalCompilation
static StringcurrentFile
current file name
static StringcurrentPragmaPrefix
currently active pragma prefix
protected static intcurrent_line
Current line number for use in error messages.
protected static intcurrent_position
Character position in current line.
protected static Hashtabledefines
Defined symbols (preprocessor)
protected static intEOF_CHAR
EOF constant.
protected static booleanin_string
Have we already read a '"' ?
protected static Hashtablejava_keywords
Table of Java reserved names.
protected static Hashtablekeywords
Table of keywords.
protected static Hashtablekeywords_lower_case
Table of keywords, stored in lower case.
protected static StringBufferline
Current line for use in error messages.
protected static intnext_char
First and second character of lookahead.
protected static intnext_char2
static intwarning_count
Count of warnings issued so far
protected static booleanwide
Are we processing a wide char or string ?
Method Summary
protected static voidadvance()
Advance the scanner one character in the input stream.
static StringcheckIdentifier(String str)
Checks whether Identifier str is legal and returns it.
static intcurrentLine()
record information about the last lexical scope so that it can be restored later
static voiddefine(String symbol, String value)
static Stringdefined(String symbol)
protected static tokendo_symbol()
Process an identifier.
static voidemit_error(String message)
Emit an error message.
static voidemit_error(String message, str_token t)
static voidemit_warn(String message)
Emit a warning message.
static voidemit_warn(String message, str_token t)
protected static intfind_single_char(int ch)
Try to look up a single character symbol, returns -1 for not found.
static PositionInfogetPosition()
return the current reading position
protected static booleanid_char(int ch)
Determine if a character is ok for the middle of an id.
protected static booleanid_start_char(int ch)
Determine if a character is ok to start an id.
static voidinit()
Initialize the scanner.
static booleanneedsJavaEscape(Module m)
static tokennext_token()
Return one token.
protected static voidpreprocess()
Preprocessor directives are handled here.
protected static tokenreal_next_token()
The actual routine to return one token.
static voidreset()
reset the scanner state
static voidrestorePosition(PositionInfo p)
static booleanstrictJavaEscapeCheck(String s)
called during the parse phase to catch clashes with Java reserved words.
protected static voidswallow_comment()
Handle swallowing up a comment.
static voidundefine(String symbol)

Field Detail

char_symbols

protected static Hashtable char_symbols
Table of single character symbols. For ease of implementation, we store all unambiguous single character tokens in this table of Integer objects keyed by Integer objects with the numerical value of the appropriate char (currently Character objects have a bug which precludes their use in tables).

conditionalCompilation

protected static boolean conditionalCompilation

currentFile

public static String currentFile
current file name

currentPragmaPrefix

public static String currentPragmaPrefix
currently active pragma prefix

current_line

protected static int current_line
Current line number for use in error messages.

current_position

protected static int current_position
Character position in current line.

defines

protected static Hashtable defines
Defined symbols (preprocessor)

EOF_CHAR

protected static final int EOF_CHAR
EOF constant.

in_string

protected static boolean in_string
Have we already read a '"' ?

java_keywords

protected static Hashtable java_keywords
Table of Java reserved names.

keywords

protected static Hashtable keywords
Table of keywords. Keywords are initially treated as identifiers. Just before they are returned we look them up in this table to see if they match one of the keywords. The string of the name is the key here, which indexes Integer objects holding the symbol number.

keywords_lower_case

protected static Hashtable keywords_lower_case
Table of keywords, stored in lower case. Keys are the lower case version of the keywords used as keys for the keywords hash above, and the values are the case sensitive versions of the keywords. This table is used for detecting collisions of identifiers with keywords.

line

protected static StringBuffer line
Current line for use in error messages.

next_char

protected static int next_char
First and second character of lookahead.

next_char2

protected static int next_char2

warning_count

public static int warning_count
Count of warnings issued so far

wide

protected static boolean wide
Are we processing a wide char or string ?

Method Detail

advance

protected static void advance()
Advance the scanner one character in the input stream. This moves next_char2 to next_char and then reads a new next_char2.

checkIdentifier

public static String checkIdentifier(String str)
Checks whether Identifier str is legal and returns it. If the identifier is escaped with a leading underscore, that underscore is removed. If a the legal IDL identifier clashes with a Java reserved word, an underscore is prepended.

Parameters: str - the IDL identifier

Prints an error msg if the identifier collides with an IDL keyword.

currentLine

public static int currentLine()
record information about the last lexical scope so that it can be restored later

define

public static void define(String symbol, String value)

defined

public static String defined(String symbol)

do_symbol

protected static token do_symbol()
Process an identifier.

Identifiers begin with a letter, underscore, or dollar sign, which is followed by zero or more letters, numbers, underscores or dollar signs. This routine returns a str_token suitable for return by the scanner or null, if the string that was read expanded to a symbol that was #defined. In this case, the symbol is expanded in place

emit_error

public static void emit_error(String message)
Emit an error message. The message will be marked with both the current line number and the position in the line. Error messages are printed on standard error (System.err).

Parameters: message the message to print.

emit_error

public static void emit_error(String message, str_token t)

emit_warn

public static void emit_warn(String message)
Emit a warning message. The message will be marked with both the current line number and the position in the line. Messages are printed on standard error (System.err).

Parameters: message the message to print.

emit_warn

public static void emit_warn(String message, str_token t)

find_single_char

protected static int find_single_char(int ch)
Try to look up a single character symbol, returns -1 for not found.

Parameters: ch the character in question.

getPosition

public static PositionInfo getPosition()
return the current reading position

id_char

protected static boolean id_char(int ch)
Determine if a character is ok for the middle of an id.

Parameters: ch the character in question.

id_start_char

protected static boolean id_start_char(int ch)
Determine if a character is ok to start an id.

Parameters: ch the character in question.

init

public static void init()
Initialize the scanner. This sets up the keywords and char_symbols tables and reads the first two characters of lookahead. "Object" is listed as reserved in the OMG spec. "int" is not, but I reserved it to bar its usage as a legal integer type.

needsJavaEscape

public static boolean needsJavaEscape(Module m)

next_token

public static token next_token()
Return one token. This is the main external interface to the scanner. It consumes sufficient characters to determine the next input token and returns it.

preprocess

protected static void preprocess()
Preprocessor directives are handled here.

real_next_token

protected static token real_next_token()
The actual routine to return one token.

Returns: token

Throws: java.io.IOException

reset

public static void reset()
reset the scanner state

restorePosition

public static void restorePosition(PositionInfo p)

strictJavaEscapeCheck

public static boolean strictJavaEscapeCheck(String s)
called during the parse phase to catch clashes with Java reserved words.

swallow_comment

protected static void swallow_comment()
Handle swallowing up a comment. Both old style C and new style C++ comments are handled.

undefine

public static void undefine(String symbol)