org.jacorb.idl

Class lexer


public class lexer
extends java.lang.Object

This class implements a scanner (aka lexical analyzer or lexer) for IDL. The scanner reads characters from a global input stream and returns integers corresponding to the terminal number of the next token. Once the end of input is reached the EOF token is returned on every subsequent call.

All symbol constants are defined in sym.java which is generated by JavaCup from parser.cup.

In addition to the scanner proper (called first via init() then with next_token() to get each token) this class provides simple error and warning routines and keeps a count of errors and warnings that is publicly accessible. It also provides basic preprocessing facilties, i.e. it does handle preprocessor directives such as #define, #undef, #include, etc. although it does not provide full C++ preprocessing This class is "static" (i.e., it has only static members and methods).

Version:
$Id: lexer.java,v 1.48 2004/05/06 12:39:59 nicolas Exp $

Author:
Gerald Brose

Field Summary

protected static int
EOF_CHAR
EOF constant.
protected static Hashtable
char_symbols
Table of single character symbols.
protected static boolean
conditionalCompilation
static String
currentFile
current file name
static String
currentPragmaPrefix
currently active pragma prefix
protected static int
current_line
Current line number for use in error messages.
protected static int
current_position
Character position in current line.
protected static Hashtable
defines
Defined symbols (preprocessor)
protected static boolean
in_string
Have we already read a '"' ?
protected static Hashtable
java_keywords
Table of Java reserved names.
protected static Hashtable
keywords
Table of keywords.
protected static Hashtable
keywords_lower_case
Table of keywords, stored in lower case.
protected static StringBuffer
line
Current line for use in error messages.
protected static int
next_char
First and second character of lookahead.
protected static int
next_char2
static int
warning_count
Count of warnings issued so far
protected static boolean
wide
Are we processing a wide char or string ?

Method Summary

protected static void
advance()
Advance the scanner one character in the input stream.
static String
checkIdentifier(String str)
Checks whether Identifier str is legal and returns it.
static int
currentLine()
record information about the last lexical scope so that it can be restored later
static void
define(String symbol, String value)
static String
defined(String symbol)
protected static token
do_symbol()
Process an identifier.
static void
emit_error(String message)
Emit an error message.
static void
emit_error(String message, str_token t)
static void
emit_warn(String message)
Emit a warning message.
static void
emit_warn(String message, str_token t)
protected static int
find_single_char(int ch)
Try to look up a single character symbol, returns -1 for not found.
static PositionInfo
getPosition()
return the current reading position
protected static boolean
id_char(int ch)
Determine if a character is ok for the middle of an id.
protected static boolean
id_start_char(int ch)
Determine if a character is ok to start an id.
static void
init()
Initialize the scanner.
static boolean
needsJavaEscape(Module m)
static token
next_token()
Return one token.
protected static void
preprocess()
Preprocessor directives are handled here.
protected static token
real_next_token()
The actual routine to return one token.
static void
reset()
reset the scanner state
static void
restorePosition(PositionInfo p)
static boolean
strictJavaEscapeCheck(String s)
called during the parse phase to catch clashes with Java reserved words.
protected static void
swallow_comment()
Handle swallowing up a comment.
static void
undefine(String symbol)

Field Details

EOF_CHAR

protected static final int EOF_CHAR
EOF constant.

Field Value:
-1


char_symbols

protected static Hashtable char_symbols
Table of single character symbols. For ease of implementation, we store all unambiguous single character tokens in this table of Integer objects keyed by Integer objects with the numerical value of the appropriate char (currently Character objects have a bug which precludes their use in tables).


conditionalCompilation

protected static boolean conditionalCompilation


currentFile

public static String currentFile
current file name


currentPragmaPrefix

public static String currentPragmaPrefix
currently active pragma prefix


current_line

protected static int current_line
Current line number for use in error messages.


current_position

protected static int current_position
Character position in current line.


defines

protected static Hashtable defines
Defined symbols (preprocessor)


in_string

protected static boolean in_string
Have we already read a '"' ?


java_keywords

protected static Hashtable java_keywords
Table of Java reserved names.


keywords

protected static Hashtable keywords
Table of keywords. Keywords are initially treated as identifiers. Just before they are returned we look them up in this table to see if they match one of the keywords. The string of the name is the key here, which indexes Integer objects holding the symbol number.


keywords_lower_case

protected static Hashtable keywords_lower_case
Table of keywords, stored in lower case. Keys are the lower case version of the keywords used as keys for the keywords hash above, and the values are the case sensitive versions of the keywords. This table is used for detecting collisions of identifiers with keywords.


line

protected static StringBuffer line
Current line for use in error messages.


next_char

protected static int next_char
First and second character of lookahead.


next_char2

protected static int next_char2


warning_count

public static int warning_count
Count of warnings issued so far


wide

protected static boolean wide
Are we processing a wide char or string ?

Method Details

advance

protected static void advance()
            throws java.io.IOException
Advance the scanner one character in the input stream. This moves next_char2 to next_char and then reads a new next_char2.


checkIdentifier

public static String checkIdentifier(String str)
Checks whether Identifier str is legal and returns it. If the identifier is escaped with a leading underscore, that underscore is removed. If a the legal IDL identifier clashes with a Java reserved word, an underscore is prepended.

Parameters:
str - - the IDL identifier

Prints an error msg if the identifier collides with an IDL keyword.


currentLine

public static int currentLine()
record information about the last lexical scope so that it can be restored later


define

public static void define(String symbol,
                          String value)


defined

public static String defined(String symbol)


do_symbol

protected static token do_symbol()
            throws java.io.IOException
Process an identifier.

Identifiers begin with a letter, underscore, or dollar sign, which is followed by zero or more letters, numbers, underscores or dollar signs. This routine returns a str_token suitable for return by the scanner or null, if the string that was read expanded to a symbol that was #defined. In this case, the symbol is expanded in place


emit_error

public static void emit_error(String message)
Emit an error message. The message will be marked with both the current line number and the position in the line. Error messages are printed on standard error (System.err).

Parameters:
message - the message to print.


emit_error

public static void emit_error(String message,
                              str_token t)


emit_warn

public static void emit_warn(String message)
Emit a warning message. The message will be marked with both the current line number and the position in the line. Messages are printed on standard error (System.err).

Parameters:
message - the message to print.


emit_warn

public static void emit_warn(String message,
                             str_token t)


find_single_char

protected static int find_single_char(int ch)
Try to look up a single character symbol, returns -1 for not found.

Parameters:
ch - the character in question.


getPosition

public static PositionInfo getPosition()
return the current reading position


id_char

protected static boolean id_char(int ch)
Determine if a character is ok for the middle of an id.

Parameters:
ch - the character in question.


id_start_char

protected static boolean id_start_char(int ch)
Determine if a character is ok to start an id.

Parameters:
ch - the character in question.


init

public static void init()
            throws java.io.IOException
Initialize the scanner. This sets up the keywords and char_symbols tables and reads the first two characters of lookahead. "Object" is listed as reserved in the OMG spec. "int" is not, but I reserved it to bar its usage as a legal integer type.


needsJavaEscape

public static boolean needsJavaEscape(Module m)


next_token

public static token next_token()
            throws java.io.IOException
Return one token. This is the main external interface to the scanner. It consumes sufficient characters to determine the next input token and returns it.


preprocess

protected static void preprocess()
            throws java.io.IOException
Preprocessor directives are handled here.


real_next_token

protected static token real_next_token()
            throws java.io.IOException
The actual routine to return one token.

Returns:
token


reset

public static void reset()
reset the scanner state


restorePosition

public static void restorePosition(PositionInfo p)


strictJavaEscapeCheck

public static boolean strictJavaEscapeCheck(String s)
called during the parse phase to catch clashes with Java reserved words.


swallow_comment

protected static void swallow_comment()
            throws java.io.IOException
Handle swallowing up a comment. Both old style C and new style C++ comments are handled.


undefine

public static void undefine(String symbol)