com.lowagie.text.pdf.hyphenation
public class HyphenationTree extends TernaryTree implements PatternConsumer
Field Summary | |
---|---|
protected TernaryTree | classmap
This map stores the character classes |
TernaryTree | ivalues
Temporary map to store interletter values on pattern loading. |
static long | serialVersionUID |
protected HashMap | stoplist
This map stores hyphenation exceptions |
protected ByteVector | vspace
value space: stores the inteletter values |
Constructor Summary | |
---|---|
HyphenationTree() |
Method Summary | |
---|---|
void | addClass(String chargroup)
Add a character class to the tree. |
void | addException(String word, ArrayList hyphenatedword)
Add an exception to the tree. |
void | addPattern(String pattern, String ivalue)
Add a pattern to the tree. |
String | findPattern(String pat) |
protected byte[] | getValues(int k) |
protected int | hstrcmp(char[] s, int si, char[] t, int ti)
String compare, returns 0 if equal or
t is a substring of s |
Hyphenation | hyphenate(String word, int remainCharCount, int pushCharCount)
Hyphenate word and return a Hyphenation object. |
Hyphenation | hyphenate(char[] w, int offset, int len, int remainCharCount, int pushCharCount)
Hyphenate word and return an array of hyphenation points. |
void | loadSimplePatterns(InputStream stream) |
protected int | packValues(String values)
Packs the values by storing them in 4 bits, two values into a byte
Values range is from 0 to 9. |
void | printStats() |
protected void | searchPatterns(char[] word, int index, byte[] il) Search for all possible partial matches of word starting at index an update interletter values. |
protected String | unpackValues(int k) |
SimplePatternParser
as callback to
add character classes. Character classes define the
valid word characters for hyphenation. If a word contains
a character not defined in any of the classes, it is not hyphenated.
It also defines a way to normalize the characters in order
to compare them with the stored patterns. Usually pattern
files use only lower case characters, in this case a class
for letter 'a', for example, should be defined as "aA", the first
character being the normalization char. SimplePatternParser
class as callback to
store the hyphenation exceptions.Parameters: word normalized word hyphenatedword a vector of alternating strings and
hyphen
objects.
SimplePatternParser
class as callback to
add a pattern to the tree.Parameters: pattern the hyphenation pattern ivalue interletter weight values indicating the desirability and priority of hyphenating at a given point within the pattern. It should contain only digit characters. (i.e. '0' to '9').
Parameters: word the word to be hyphenated remainCharCount Minimum number of characters allowed before the hyphenation point. pushCharCount Minimum number of characters allowed after the hyphenation point.
Returns: a Hyphenation
object representing
the hyphenated word or null if word is not hyphenated.
Parameters: w char array that contains the word offset Offset to first character in word len Length of word remainCharCount Minimum number of characters allowed before the hyphenation point. pushCharCount Minimum number of characters allowed after the hyphenation point.
Returns: a Hyphenation
object representing
the hyphenated word or null if word is not hyphenated.
Parameters: values a string of digits from '0' to '9' representing the interletter values.
Returns: the index into the vspace array where the packed values are stored.
Search for all possible partial matches of word starting at index an update interletter values. In other words, it does something like:
for(i=0; i
But it is done in an efficient way since the patterns are
stored in a ternary tree. In fact, this is the whole purpose
of having the tree: doing this search without having to test
every single pattern. The number of patterns for languages
such as English range from 4000 to 10000. Thus, doing thousands
of string comparisons for each word to hyphenate would be
really slow without the tree. The tradeoff is memory, but
using a ternary tree instead of a trie, almost halves the
the memory used by Lout or TeX. It's also faster than using
a hash table
Parameters: word null terminated word to match index start index from word il interletter values array to update