pitt.search.semanticvectors
Class CompareTermsBatch
java.lang.Object
pitt.search.semanticvectors.CompareTermsBatch
public class CompareTermsBatch
- extends java.lang.Object
Command line term vector comparison utility designed to be run in
batch mode. This enables users to
get raw similarities between two concepts. These concepts may be
individual words or lists of words. For example, if your vectorfile
is the (default) termvectors.bin, you should be able to run
comparisons like
echo 'blue | red green' | java pitt.search.semanticvectors.CompareTermsBatch
which will give you the cosine similarity of the "blue"
vector with the sum of the "red" and "green" vectors.
The process can be set up to accept long lists of piped input without
requiring the overhead of reloading the lists of vectors, and can store the
vectors in memory.
If the term NOT is used in one of the lists, subsequent terms in
that list will be negated.
- Author:
- Andrew MacKinlay
- See Also:
Search
Method Summary |
static void |
main(java.lang.String[] args)
Main function for command line use. |
static void |
usage()
Prints the following usage message:
Usage: java pitt.search.semanticvectors.CompareTermsBatch [-queryvectorfile vectorfile]
[-luceneindexpath path_to_lucene_index]
[-batchcompareseparator separator]
[-vectorstorelocation loc]
-luceneindexpath argument may be used to get term weights from
term frequency, doc frequency, etc. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CompareTermsBatch
public CompareTermsBatch()
usage
public static void usage()
- Prints the following usage message:
Usage: java pitt.search.semanticvectors.CompareTermsBatch [-queryvectorfile vectorfile]
[-luceneindexpath path_to_lucene_index]
[-batchcompareseparator separator]
[-vectorstorelocation loc]
-luceneindexpath argument may be used to get term weights from
term frequency, doc frequency, etc. in lucene index.
-batchcompareseparator separator which is used to split each input line into strings of terms
(default '|')
-vectorstorelocation: 'ram' or 'disk', for where to store vectors
For each line of input from STDIN, this will split the input into two strings
of terms at the separator, and output a similarity score to STDOUT.
If the term NOT is used in one of the lists, subsequent terms in
that list will be negated (as in Search class).
- See Also:
Search
main
public static void main(java.lang.String[] args)
throws java.lang.IllegalArgumentException
- Main function for command line use.
- Parameters:
args
- See usage();
- Throws:
java.lang.IllegalArgumentException