it.unimi.dsi.sux4j.mph
Class MWHCFunction<T>

java.lang.Object
  extended by it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction<T>
      extended by it.unimi.dsi.sux4j.mph.MWHCFunction<T>
All Implemented Interfaces:
Function<T,java.lang.Long>, Object2LongFunction<T>, java.io.Serializable

public class MWHCFunction<T>
extends AbstractObject2LongFunction<T>
implements java.io.Serializable

An immutable function stored using the Majewski-Wormald-Havas-Czech 3-hypergraph technique.

Instance of this class store a function from keys to values. Keys are provided by an iterable object (whose iterators must return elements in a consistent order), whereas values are provided by a LongList. A convenient constructor automatically assigns to each key its ordinal value.

As a commodity, this class provides a main method that reads from standard input a (possibly gzip'd) sequence of newline-separated strings, and writes a serialised function mapping each element of the list to its position.

Implementation Details

After generating a random 3-hypergraph with suitably sorted edges, we assign to each vertex a value in such a way that for each 3-hyperedge the xor of the three values associated to its vertices is the required value for the corresponding element of the function domain. Then, we measure whether it is favourable to store nonzero values in a separate array, using a ranked marker array to record the positions of the values equal to zero.

Since:
0.2
Author:
Sebastiano Vigna
See Also:
Serialized Form

Field Summary
protected  LongBigList data
          The final magick—the list of modulo-3 values that define the output of the minimal hash function.
protected  int m
          The number of vertices of the intermediate hypergraph.
protected  LongArrayBitVector marker
          Optionally, a rank structure built on this bit array is used to mark positions containing non-zero value; indexing in data is made by ranking if this field is non-null.
protected  int n
          The number of elements.
protected  Rank16 rank
          The ranking structure on marker.
protected  long seed
          The seed of the underlying 3-hypergraph.
static long serialVersionUID
           
protected  TransformationStrategy<? super T> transform
          The transformation strategy to turn objects of type T into bit vectors.
protected  int width
          The data width.
 
Fields inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction
defRetValue
 
Constructor Summary
  MWHCFunction(java.lang.Iterable<? extends T> elements, TransformationStrategy<? super T> transform)
          Creates a new function for the given elements, assigning to each element its ordinal position.
  MWHCFunction(java.lang.Iterable<? extends T> elements, TransformationStrategy<? super T> transform, LongList values, int width)
          Creates a new function for the given elements and values.
protected MWHCFunction(MWHCFunction<T> function)
          Creates a new function by copying a given one; non-transient fields are (shallow) copied.
 
Method Summary
 boolean containsKey(java.lang.Object o)
           
 long getLong(java.lang.Object o)
           
static void main(java.lang.String[] arg)
           
 long numBits()
          Returns the number of bits used by this structure.
 int size()
          Returns the number of elements in the function domain.
 
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction
clear, defaultReturnValue, defaultReturnValue, get, put, put, remove, removeLong
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

serialVersionUID

public static final long serialVersionUID
See Also:
Constant Field Values

n

protected final int n
The number of elements.


m

protected final int m
The number of vertices of the intermediate hypergraph.


width

protected final int width
The data width.


seed

protected final long seed
The seed of the underlying 3-hypergraph.


data

protected final LongBigList data
The final magick—the list of modulo-3 values that define the output of the minimal hash function.


marker

protected final LongArrayBitVector marker
Optionally, a rank structure built on this bit array is used to mark positions containing non-zero value; indexing in data is made by ranking if this field is non-null.


rank

protected final Rank16 rank
The ranking structure on marker.


transform

protected final TransformationStrategy<? super T> transform
The transformation strategy to turn objects of type T into bit vectors.

Constructor Detail

MWHCFunction

public MWHCFunction(java.lang.Iterable<? extends T> elements,
                    TransformationStrategy<? super T> transform)
Creates a new function for the given elements, assigning to each element its ordinal position.

Parameters:
elements - the elements in the domain of the function.
transform - a transformation strategy for the elements.

MWHCFunction

public MWHCFunction(java.lang.Iterable<? extends T> elements,
                    TransformationStrategy<? super T> transform,
                    LongList values,
                    int width)
Creates a new function for the given elements and values.

Parameters:
elements - the elements in the domain of the function.
transform - a transformation strategy for the elements.
values - values to be assigned to each element, in the same order of the iterator returned by elements; if null, the assigned value will the the ordinal number of each element.
width - the bit width of the values.

MWHCFunction

protected MWHCFunction(MWHCFunction<T> function)
Creates a new function by copying a given one; non-transient fields are (shallow) copied.

Parameters:
function - the function to be copied.
Method Detail

getLong

public long getLong(java.lang.Object o)
Specified by:
getLong in interface Object2LongFunction<T>

size

public int size()
Returns the number of elements in the function domain.

Specified by:
size in interface Function<T,java.lang.Long>
Returns:
the number of the elements in the function domain.

numBits

public long numBits()
Returns the number of bits used by this structure.

Returns:
the number of bits used by this structure.

containsKey

public boolean containsKey(java.lang.Object o)
Specified by:
containsKey in interface Function<T,java.lang.Long>

main

public static void main(java.lang.String[] arg)
                 throws java.lang.NoSuchMethodException,
                        java.io.IOException,
                        JSAPException
Throws:
java.lang.NoSuchMethodException
java.io.IOException
JSAPException