pitt.search.semanticvectors
Class TermTermVectorsFromLucene
java.lang.Object
pitt.search.semanticvectors.TermTermVectorsFromLucene
- All Implemented Interfaces:
- VectorStore
public class TermTermVectorsFromLucene
- extends java.lang.Object
- implements VectorStore
Implementation of vector store that creates term by term
cooccurence vectors by iterating through all the documents in a
Lucene index. This class implements a sliding context window
approach, as used by Burgess and Lund (HAL) and Schutze amongst
others Uses a sparse representation for the basic document vectors,
which saves considerable space for collections with many individual
documents.
- Author:
- Trevor Cohen, Dominic Widdows.
Constructor Summary |
TermTermVectorsFromLucene(java.lang.String indexDir,
int seedLength,
int minFreq,
int nonAlphabet,
int windowSize,
VectorStore basicTermVectors,
java.lang.String[] fieldsToIndex)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TermTermVectorsFromLucene
public TermTermVectorsFromLucene(java.lang.String indexDir,
int seedLength,
int minFreq,
int nonAlphabet,
int windowSize,
VectorStore basicTermVectors,
java.lang.String[] fieldsToIndex)
throws java.io.IOException,
java.lang.RuntimeException
- Parameters:
indexDir
- Directory containing Lucene index.seedLength
- Number of +1 or -1 entries in basic
vectors. Should be even to give same number of each.minFreq
- The minimum term frequency for a term to be indexed.windowSize
- The size of the sliding context window.fieldsToIndex
- These fields will be indexed. If null, all fields will be indexed.
- Throws:
java.io.IOException
java.lang.RuntimeException
getIndexReader
public org.apache.lucene.index.IndexReader getIndexReader()
- Returns:
- The object's indexReader.
getBasicTermVectors
public VectorStore getBasicTermVectors()
- Returns:
- The object's basicTermVectors.
getFieldsToIndex
public java.lang.String[] getFieldsToIndex()
getVector
public float[] getVector(java.lang.Object term)
- Specified by:
getVector
in interface VectorStore
- Parameters:
term
- the object whose vector you want to look up
- Returns:
- a vector (of floats)
getAllVectors
public java.util.Enumeration getAllVectors()
- Specified by:
getAllVectors
in interface VectorStore
- Returns:
- an enumeration of all the object vectors in the store.
getNumVectors
public int getNumVectors()
- Specified by:
getNumVectors
in interface VectorStore
- Returns:
- a count of the number of vectors in the store.