public class FSTCompletionLookup extends Lookup
Lookup
API to FSTCompletion
.
This adapter differs from FSTCompletion
in that it attempts
to discretize any "weights" as passed from in TermFreqIterator.weight()
to match the number of buckets. For the rationale for bucketing, see
FSTCompletion
.
Note:Discretization requires an additional sorting pass.
The range of weights for bucketing/ discretization is determined by sorting the input by weight and then dividing into equal ranges. Then, scores within each range are assigned to that bucket.
Note that this means that even large differences in weights may be lost during automaton construction, but the overall distinction between "classes" of weights will be preserved regardless of the distribution of weights.
For fine-grained control over which weights are assigned to which buckets,
use FSTCompletion
directly or TSTLookup
, for example.
FSTCompletion
Lookup.LookupPriorityQueue, Lookup.LookupResult
CHARSEQUENCE_COMPARATOR
Constructor and Description |
---|
FSTCompletionLookup()
This constructor prepares for creating a suggested FST using the
build(TermFreqIterator) method. |
FSTCompletionLookup(FSTCompletion completion,
boolean exactMatchFirst)
This constructor takes a pre-built automaton.
|
FSTCompletionLookup(int buckets,
boolean exactMatchFirst)
This constructor prepares for creating a suggested FST using the
build(TermFreqIterator) method. |
Modifier and Type | Method and Description |
---|---|
void |
build(TermFreqIterator tfit)
Builds up a new internal
Lookup representation based on the given TermFreqIterator . |
java.lang.Object |
get(java.lang.CharSequence key) |
boolean |
load(java.io.InputStream input)
Discard current lookup data and load it from a previously saved copy.
|
java.util.List<Lookup.LookupResult> |
lookup(java.lang.CharSequence key,
boolean higherWeightsFirst,
int num)
Look up a key and return possible completion for this key.
|
boolean |
store(java.io.OutputStream output)
Persist the constructed lookup data to a directory.
|
public FSTCompletionLookup()
build(TermFreqIterator)
method. The number of weight
discretization buckets is set to FSTCompletion.DEFAULT_BUCKETS
and
exact matches are promoted to the top of the suggestions list.public FSTCompletionLookup(int buckets, boolean exactMatchFirst)
build(TermFreqIterator)
method.buckets
- The number of weight discretization buckets (see
FSTCompletion
for details).exactMatchFirst
- If true
exact matches are promoted to the top of the
suggestions list. Otherwise they appear in the order of
discretized weight and alphabetical within the bucket.public FSTCompletionLookup(FSTCompletion completion, boolean exactMatchFirst)
completion
- An instance of FSTCompletion
.exactMatchFirst
- If true
exact matches are promoted to the top of the
suggestions list. Otherwise they appear in the order of
discretized weight and alphabetical within the bucket.public void build(TermFreqIterator tfit) throws java.io.IOException
Lookup
representation based on the given TermFreqIterator
.
The implementation might re-sort the data internally.public java.util.List<Lookup.LookupResult> lookup(java.lang.CharSequence key, boolean higherWeightsFirst, int num)
Lookup
lookup
in class Lookup
key
- lookup key. Depending on the implementation this may be
a prefix, misspelling, or even infix.higherWeightsFirst
- return only more popular resultsnum
- maximum number of results to returnpublic java.lang.Object get(java.lang.CharSequence key)
public boolean store(java.io.OutputStream output) throws java.io.IOException
Lookup
public boolean load(java.io.InputStream input) throws java.io.IOException
Lookup