pitt.search.semanticvectors
Class CompareTermsBatch

java.lang.Object
  extended by pitt.search.semanticvectors.CompareTermsBatch

public class CompareTermsBatch
extends java.lang.Object

Command line term vector comparison utility designed to be run in batch mode. This enables users to get raw similarities between two concepts. These concepts may be individual words or lists of words. For example, if your vectorfile is the (default) termvectors.bin, you should be able to run comparisons like
echo 'blue | red green' | java pitt.search.semanticvectors.CompareTermsBatch
which will give you the cosine similarity of the "blue" vector with the sum of the "red" and "green" vectors.
The process can be set up to accept long lists of piped input without requiring the overhead of reloading the lists of vectors, and can store the vectors in memory.
If the term NOT is used in one of the lists, subsequent terms in that list will be negated.

Author:
Andrew MacKinlay
See Also:
Search

Constructor Summary
CompareTermsBatch()
           
 
Method Summary
static void main(java.lang.String[] args)
          Main function for command line use.
static void usage()
          Prints the following usage message:
Usage: java pitt.search.semanticvectors.CompareTermsBatch [-queryvectorfile vectorfile]
[-luceneindexpath path_to_lucene_index]
[-batchcompareseparator separator]
[-vectorstorelocation loc]
-luceneindexpath argument may be used to get term weights from
term frequency, doc frequency, etc.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CompareTermsBatch

public CompareTermsBatch()
Method Detail

usage

public static void usage()
Prints the following usage message:
Usage: java pitt.search.semanticvectors.CompareTermsBatch [-queryvectorfile vectorfile]
[-luceneindexpath path_to_lucene_index]
[-batchcompareseparator separator]
[-vectorstorelocation loc]
-luceneindexpath argument may be used to get term weights from
term frequency, doc frequency, etc. in lucene index.
-batchcompareseparator separator which is used to split each input line into strings of terms
(default '|')
-vectorstorelocation: 'ram' or 'disk', for where to store vectors
For each line of input from STDIN, this will split the input into two strings
of terms at the separator, and output a similarity score to STDOUT.
If the term NOT is used in one of the lists, subsequent terms in
that list will be negated (as in Search class).

See Also:
Search

main

public static void main(java.lang.String[] args)
                 throws java.lang.IllegalArgumentException
Main function for command line use.

Parameters:
args - See usage();
Throws:
java.lang.IllegalArgumentException