it.unimi.dsi.mg4j.util
Class SignedMinimalPerfectHash

java.lang.Object
  extended by it.unimi.dsi.mg4j.index.AbstractTermMap
      extended by it.unimi.dsi.mg4j.util.MinimalPerfectHash
          extended by it.unimi.dsi.mg4j.util.SignedMinimalPerfectHash
All Implemented Interfaces:
TermMap, Serializable
Direct Known Subclasses:
HashCodeSignedMinimalPerfectHash, LiterallySignedMinimalPerfectHash, ShiftAddXorLongSignedMinimalPerfectHash, ShiftAddXorSignedMinimalPerfectHash

Deprecated. Use the new hashing stuff in Sux4J.

@Deprecated
public abstract class SignedMinimalPerfectHash
extends MinimalPerfectHash
implements Serializable

Signed order-preserving minimal perfect hash tables.

Minimal perfect hash tables will always return a result, even for terms that were not present in the collection indexed by the table. Sometimes you may prefer to single out terms that were not present in the collection.

To this purpose, MG4J provides signed minimal perfect tables. In a signed table, every term in the collection gets a signature that is used to tell false positives. Signature may go from the simple hashcode-based signatures provided by HashCodeSignedMinimalPerfectHash class, to sophisticated cryptographic signatures, to (at the other extreme) a class that actually stores the terms (and thus completely avoids false positives) such as LiterallySignedMinimalPerfectHash.

A signed table extends this class, and provides two methods: a initSignatures(Iterable) method that sets up the necessary data structures, and a checkSignature(CharSequence,int) method that checks a given character sequence against the signature stored for a term having given index.

It is good practise, of course, to replicate the constructors of this class in all implementing subclasses (by simply invoking super with the same arguments). Moreover, to be useful classes implementing this class must be serialisable.

Since:
0.4
Author:
Sebastiano Vigna, Marco Olivo
See Also:
Serialized Form

Field Summary
static long serialVersionUID
          Deprecated.  
 
Fields inherited from class it.unimi.dsi.mg4j.util.MinimalPerfectHash
ENLARGEMENT_FACTOR, g, init, m, n, n4, NODE_OVERHEAD, rightShift, t, TERM_THRESHOLD, WEIGHT_UNKNOWN, WEIGHT_UNKNOWN_SORTED_TERMS, weight0, weight1, weight2, weightLength
 
Constructor Summary
SignedMinimalPerfectHash(Iterable<? extends CharSequence> terms)
          Deprecated. Creates a new signed order-preserving minimal perfect hash table for the given terms, using as many weights as the longest term in the collection.
SignedMinimalPerfectHash(Iterable<? extends CharSequence> terms, int weightLength)
          Deprecated. Creates a new signed order-preserving minimal perfect hash table for the given terms using the given number of weights.
SignedMinimalPerfectHash(String termFile, String encoding)
          Deprecated. Creates a new signed order-preserving minimal perfect hash table for the given file of terms.
SignedMinimalPerfectHash(String termFile, String encoding, boolean zipped)
          Deprecated. Creates a new signed order-preserving minimal perfect hash table for the (possibly gzip'd) given file of terms.
SignedMinimalPerfectHash(String termFile, String encoding, int weightLength)
          Deprecated. Creates a new signed order-preserving minimal perfect hash table for the given file of terms using the given number of weights.
SignedMinimalPerfectHash(String termFile, String encoding, int weightLength, boolean zipped)
          Deprecated. Creates a new signed order-preserving minimal perfect hash table for the (possibly gzip'd) given file of terms using the given number of weights.
 
Method Summary
 MinimalPerfectHash asUnsigned()
          Deprecated. Returns a unsigned view of this signed minimal perfect hash.
protected abstract  boolean checkSignature(byte[] a, int off, int len, int index)
          Deprecated. Checks a signature against a byte-array fragment.
protected abstract  boolean checkSignature(CharSequence term, int index)
          Deprecated. Checks a signature against a character sequence.
 int getNumber(byte[] a, int off, int len)
          Deprecated. Hashes a term given as a byte-array fragment interpreted in the ISO-8859-1 charset encoding.
 int getNumber(CharSequence term)
          Deprecated. Hashes a given term.
 int getNumber(MutableString term)
          Deprecated. Hashes a given term.
protected abstract  void initSignatures(Iterable<? extends CharSequence> terms)
          Deprecated. Sets up the signature system from a collection.
static void main(String[] arg)
          Deprecated.  
 
Methods inherited from class it.unimi.dsi.mg4j.util.MinimalPerfectHash
getFromT, getNumber, hash, hasTerms, main, size, weightLength
 
Methods inherited from class it.unimi.dsi.mg4j.index.AbstractTermMap
getIndex, getTerm, getTerm
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

serialVersionUID

public static final long serialVersionUID
Deprecated. 
See Also:
Constant Field Values
Constructor Detail

SignedMinimalPerfectHash

public SignedMinimalPerfectHash(Iterable<? extends CharSequence> terms)
Deprecated. 
Creates a new signed order-preserving minimal perfect hash table for the given terms, using as many weights as the longest term in the collection.

After calling the corresponding constructor of MinimalPerfectHash, this constructor will invoke initSignatures(Iterable).

Parameters:
terms - some terms to hash; it is assumed that they do not contain duplicates.
See Also:
MinimalPerfectHash.MinimalPerfectHash(Iterable)

SignedMinimalPerfectHash

public SignedMinimalPerfectHash(Iterable<? extends CharSequence> terms,
                                int weightLength)
Deprecated. 
Creates a new signed order-preserving minimal perfect hash table for the given terms using the given number of weights.

After calling the corresponding constructor of MinimalPerfectHash, this constructor will invoke initSignatures(Iterable).

Parameters:
terms - some terms to hash; it is assumed that no terms share a common prefix of weightLength characters.
weightLength - the number of weights used generating the intermediate hash functions.
See Also:
MinimalPerfectHash.MinimalPerfectHash(Iterable, int)

SignedMinimalPerfectHash

public SignedMinimalPerfectHash(String termFile,
                                String encoding,
                                int weightLength,
                                boolean zipped)
Deprecated. 
Creates a new signed order-preserving minimal perfect hash table for the (possibly gzip'd) given file of terms using the given number of weights.

After calling the corresponding constructor of MinimalPerfectHash, this constructor will invoke initSignatures(Iterable).

Parameters:
termFile - a file containing one term on each line; it is assumed that it does not contain terms with a common prefix of weightLength characters.
encoding - the encoding of termFile; if null, it is assumed to be the platform default encoding.
weightLength - the number of weights used generating the intermediate hash functions.
zipped - if true, the provided file is zipped and will be opened using a GZIPInputStream.
See Also:
MinimalPerfectHash.MinimalPerfectHash(String,String,int,boolean)

SignedMinimalPerfectHash

public SignedMinimalPerfectHash(String termFile,
                                String encoding,
                                boolean zipped)
Deprecated. 
Creates a new signed order-preserving minimal perfect hash table for the (possibly gzip'd) given file of terms.

After calling the corresponding constructor of MinimalPerfectHash, this constructor will invoke initSignatures(Iterable).

Parameters:
termFile - a file containing one term on each line; it is assumed that it does not contain terms with a common prefix of weightLength characters.
encoding - the encoding of termFile; if null, it is assumed to be the platform default encoding.
zipped - if true, the provided file is zipped and will be opened using a GZIPInputStream.
See Also:
MinimalPerfectHash.MinimalPerfectHash(String,String,boolean)

SignedMinimalPerfectHash

public SignedMinimalPerfectHash(String termFile,
                                String encoding,
                                int weightLength)
Deprecated. 
Creates a new signed order-preserving minimal perfect hash table for the given file of terms using the given number of weights.

After calling the corresponding constructor of MinimalPerfectHash, this constructor will invoke initSignatures(Iterable).

Parameters:
termFile - a file containing one term on each line; it is assumed that it does not contain terms with a common prefix of weightLength characters.
encoding - the encoding of termFile; if null, it is assumed to be the platform default encoding.
weightLength - the number of weights used generating the intermediate hash functions.
See Also:
MinimalPerfectHash.MinimalPerfectHash(String,String,int)

SignedMinimalPerfectHash

public SignedMinimalPerfectHash(String termFile,
                                String encoding)
Deprecated. 
Creates a new signed order-preserving minimal perfect hash table for the given file of terms.

After calling the corresponding constructor of MinimalPerfectHash, this constructor will invoke initSignatures(Iterable).

Parameters:
termFile - a file containing one term on each line; it is assumed that it does not contain terms with a common prefix of weightLength characters.
encoding - the encoding of termFile; if null, it is assumed to be the platform default encoding.
See Also:
MinimalPerfectHash.MinimalPerfectHash(String,String)
Method Detail

getNumber

public int getNumber(CharSequence term)
Deprecated. 
Hashes a given term.

Specified by:
getNumber in interface TermMap
Overrides:
getNumber in class MinimalPerfectHash
Parameters:
term - a term to hash.
Returns:
the position of the given term in the generating collection, starting from 0, if the term was in the original collection; otherwise, -1.

getNumber

public int getNumber(MutableString term)
Deprecated. 
Hashes a given term.

Overrides:
getNumber in class MinimalPerfectHash
Parameters:
term - a term to hash.
Returns:
the position of the given term in the generating collection, starting from 0, if the term was in the original collection; otherwise, -1.

getNumber

public int getNumber(byte[] a,
                     int off,
                     int len)
Deprecated. 
Hashes a term given as a byte-array fragment interpreted in the ISO-8859-1 charset encoding.

Overrides:
getNumber in class MinimalPerfectHash
Parameters:
a - a byte array.
off - the first valid byte in a.
len - the number of bytes composing the term, starting at off.
Returns:
the position of term defined by len bytes starting at off (interpreted as ISO-8859-1 characters) in the generating collection, starting from 0, if the term was in the original collection; otherwise, -1.

initSignatures

protected abstract void initSignatures(Iterable<? extends CharSequence> terms)
Deprecated. 
Sets up the signature system from a collection.

This abstract method must be overriden by implementing subclasses. It must set up all data structures that are necessary to handle signatures; in particular, it will usually compute signatures for all terms in the given collection.

Parameters:
terms - the collection of terms given to the constructor of this class.
See Also:
HashCodeSignedMinimalPerfectHash.initSignatures(Iterable), LiterallySignedMinimalPerfectHash.initSignatures(Iterable)

checkSignature

protected abstract boolean checkSignature(CharSequence term,
                                          int index)
Deprecated. 
Checks a signature against a character sequence.

This abstract method must be overriden by implementing subclasses. It must check whether the signature of the given character sequence matches the one stored for the index-th term.

Note that this method and checkSignature(byte[], int, int, int) must be coherent.

Parameters:
term - a character sequence.
index - an integer denoting a term in the indexed collection.
Returns:
true iff the signature of the given character sequence matches the one stored for the index-th term.
See Also:
HashCodeSignedMinimalPerfectHash.checkSignature(CharSequence, int), LiterallySignedMinimalPerfectHash.checkSignature(CharSequence,int)

checkSignature

protected abstract boolean checkSignature(byte[] a,
                                          int off,
                                          int len,
                                          int index)
Deprecated. 
Checks a signature against a byte-array fragment.

This abstract method must be overriden by implementing subclasses. It must check whether the signature of the given byte-array fragment (interpreted as an ISO-8859-1 string) matches the one stored for the index-th term.

Note that this method and checkSignature(CharSequence, int) must be coherent.

Parameters:
a - a byte array.
off - the first valid byte in a.
len - the number of bytes composing the term, starting at off.
Returns:
true if the signature stored for the term defined by len bytes starting at off (interpreted as ISO-8859-1 characters) matches the one stored for the index-th term.
See Also:
HashCodeSignedMinimalPerfectHash.checkSignature(CharSequence, int), LiterallySignedMinimalPerfectHash.checkSignature(CharSequence,int)

asUnsigned

public MinimalPerfectHash asUnsigned()
Deprecated. 
Returns a unsigned view of this signed minimal perfect hash.

Returns:
an unsigned view of this minimal perfect hash.

main

public static void main(String[] arg)
                 throws InstantiationException,
                        IllegalAccessException,
                        InvocationTargetException,
                        NoSuchMethodException,
                        IOException,
                        com.martiansoftware.jsap.JSAPException,
                        ClassNotFoundException
Deprecated. 
Throws:
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
IOException
com.martiansoftware.jsap.JSAPException
ClassNotFoundException