org.argouml.util
Class MyTokenizer

java.lang.Object
  extended byorg.argouml.util.MyTokenizer
All Implemented Interfaces:
java.util.Enumeration

public class MyTokenizer
extends java.lang.Object
implements java.util.Enumeration

Class for dividing a String into any number of parts. Each part will be a substring of the original String. The first part will at least contain the first character in the string. All following parts will at least contain the first character in the String not covered by any previous part.

The delim parameter to the constructors is a comma separated list of tokens that should be recognized by the tokenizer. These tokens will be returned by the tokenizer as tokens, and any arbitrary text between them will also be returned as tokens. Since the comma has special meaning in this string, it can be escaped with \ to only mean itself (like in "\\,"). For technical reasons it is not possible for any token in this list to be more than 32 characters long.

In addition to the delim parameter it is also possible to use custom separators that allow any string that can be generated by the limited version of a Turing machine that your computer is, to be used as a delimiter.

There are some custom separators provided that you can use to get things like strings in one token. These cannot be used simultaneously by several tokenizers, ie they are not thread safe.

The tokenizer works in a kind of greedy way. When the first separator token from delim is matched or any CustomSeparator returns true from addChar, then it is satisfied it has found a token and does NOT check if it could have found a longer token. Eg: if you have this delim string "<,<<", then "<<" will never be found.

Example

 MyTokenizer tzer = new MyTokenizer("Hello, how are you?", " ,\\,");
 while (tzer.hasMoreTokens())
   System.out.println("\"" + tzer.nextToken() + "\"");
 

Which whould yield the following output:

   "Hello"
   ","
   " "
   "how"
   " "
   "are"
   " "
   "you?"
 

Since:
0.11.2
See Also:
CustomSeparator

Field Summary
static CustomSeparator DOUBLE_QUOTED_SEPARATOR
          A custom separator for quoted strings enclosed in double quotes and using \ as escape character.
static CustomSeparator PAREN_EXPR_SEPARATOR
          A custom separator for expressions enclosed in parentheses and matching lparams with rparams.
static CustomSeparator PAREN_EXPR_STRING_SEPARATOR
          A custom separator for expressions enclosed in parentheses and matching lparams with rparams.
static CustomSeparator SINGLE_QUOTED_SEPARATOR
          A custom separator for quoted strings enclosed in single quotes and using \ as escape character.
 
Constructor Summary
MyTokenizer(java.lang.String string, java.lang.String delim)
          Constructs a new instance.
MyTokenizer(java.lang.String string, java.lang.String delim, java.util.Collection seps)
          Constructs a new instance.
MyTokenizer(java.lang.String string, java.lang.String delim, CustomSeparator sep)
          Constructs a new instance.
 
Method Summary
 int getTokenIndex()
          Returns the index in the string of the last token returned by nextToken, or zero if no token has been retrived.
 boolean hasMoreElements()
          This class implements the Enumeration interface.
 boolean hasMoreTokens()
          Returns true if there are more tokens left.
 java.lang.Object nextElement()
          This class implements the Enumeration interface.
 java.lang.String nextToken()
          Retrives the next token.
 void putToken(java.lang.String s)
          Put a token on the input stream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SINGLE_QUOTED_SEPARATOR

public static final CustomSeparator SINGLE_QUOTED_SEPARATOR
A custom separator for quoted strings enclosed in single quotes and using \ as escape character. There may not be an end quote if the tokenizer reaches the end of the String.


DOUBLE_QUOTED_SEPARATOR

public static final CustomSeparator DOUBLE_QUOTED_SEPARATOR
A custom separator for quoted strings enclosed in double quotes and using \ as escape character. There may not be an end quote if the tokenizer reaches the end of the String.


PAREN_EXPR_SEPARATOR

public static final CustomSeparator PAREN_EXPR_SEPARATOR
A custom separator for expressions enclosed in parentheses and matching lparams with rparams. There may not be proper matching if the tokenizer reaches the end of the String. Do not use this together with PAREN_EXPR_STRING_SEPARATOR.


PAREN_EXPR_STRING_SEPARATOR

public static final CustomSeparator PAREN_EXPR_STRING_SEPARATOR
A custom separator for expressions enclosed in parentheses and matching lparams with rparams. There may not be proper matching if the tokenizer reaches the end of the String. It also takes quoted strings (either single or double quotes) in the expression into consideration, unlike PAREN_EXPR_SEPARATOR. Do not use this together with PAREN_EXPR_SEPARATOR.

Constructor Detail

MyTokenizer

public MyTokenizer(java.lang.String string,
                   java.lang.String delim)
Constructs a new instance. See above for a description of the delimiter string.

Parameters:
string - The String to be tokenized.
delim - The String of delimiters.

MyTokenizer

public MyTokenizer(java.lang.String string,
                   java.lang.String delim,
                   CustomSeparator sep)
Constructs a new instance. See above for a description of the delimiter string and custom separators.

Parameters:
string - The String to be tokenized.
delim - The String of delimiters.
sep - A custom separator to use.

MyTokenizer

public MyTokenizer(java.lang.String string,
                   java.lang.String delim,
                   java.util.Collection seps)
Constructs a new instance. See above for a description of the delimiter string and custom separators.

Parameters:
string - The String to be tokenized.
delim - The String of delimiters.
seps - Some container with custom separators to use.
Method Detail

hasMoreTokens

public boolean hasMoreTokens()
Returns true if there are more tokens left.

Returns:
true if another token can be fetched with nextToken.

nextToken

public java.lang.String nextToken()
Retrives the next token.

Returns:
The next token.

nextElement

public java.lang.Object nextElement()
This class implements the Enumeration interface. This call maps to nextToken.

Specified by:
nextElement in interface java.util.Enumeration
Returns:
nextToken();
See Also:
nextToken

hasMoreElements

public boolean hasMoreElements()
This class implements the Enumeration interface. This call maps to hasMoreTokens.

Specified by:
hasMoreElements in interface java.util.Enumeration
Returns:
hasMoreTokens();
See Also:
hasMoreTokens

getTokenIndex

public int getTokenIndex()
Returns the index in the string of the last token returned by nextToken, or zero if no token has been retrived.

Returns:
The index of the last token.

putToken

public void putToken(java.lang.String s)
Put a token on the input stream. This will be the next token read from the tokenizer. If this function is called again before the last token has been read, then it will be lost.

The index returned from getTokenIndex will be the same for the token put as that of the last token that wasn't put.

Parameters:
s - The token to put.
Throws:
java.lang.NullPointerException - if s is null.


ArgoUML © 1996-2003 (20040125)ArgoUML Project HomeArgoUML Cookbook