Package it.unimi.dsi.sux4j.mph

Minimal perfect hash functions.

See:
          Description

Class Summary
AbstractHashFunction<K> A very minimal abstract hash implementation.
BitstreamImmutablePaCoTrie<T> A succinct implementation of a binary partial compacted trie based on a recursive bitstream.
Hashes Basic hash functions.
HollowTrie<T> A hollow trie, that is, a compacted trie recording just the length of the paths associated to the internal nodes.
HollowTrieDistributor<T> A distributor based on a hollow trie.
HollowTrieMonotoneMinimalPerfectHashFunction<T> A monotone minimal perfect hash implementation based on fixed-size bucketing that uses a hollow trie as a distributor.
HypergraphSorter<T> A class implementing the 3-hypergraph edge sorting procedure that is necessary for the Majewski-Wormald-Havas-Czech technique.
LcpMonotoneMinimalPerfectHashFunction<T> A monotone minimal perfect hash implementation based on fixed-size bucketing that uses longest common prefixes as distributors.
MinimalPerfectHashFunction<T> A minimal perfect hash function.
MWHCFunction<T> An immutable function stored using the Majewski-Wormald-Havas-Czech 3-hypergraph technique.
PaCoTrieMonotoneMinimalPerfectHashFunction<T> A monotone minimal perfect hash implementation based on fixed-size bucketing that uses a partial compacted binary trie (PaCo trie) as distributor.
RelativeTrieDistributor<T> A distributor based on a relative trie.
RelativeTrieMonotoneMinimalPerfectHashFunction<T> A monotone minimal perfect hash implementation based on fixed-size bucketing that uses a relative trie as a distributor.
TwoStepsLcpMonotoneMinimalPerfectHashFunction<T> A monotone minimal perfect hash implementation based on fixed-size bucketing that uses longest common prefixes as distributors, and store their lengths using a TwoStepsMWHCFunction.
TwoStepsMWHCFunction<T> A read-only function stored using two Majewski-Wormald-Havas-Czech functions—one for frequent values, and one for infrequent values.
 

Package it.unimi.dsi.sux4j.mph Description

Minimal perfect hash functions.

Package Specification

This package provides a number of state-of-the-art implementations of static (i.e., immutable) minimal perfect hash functions, and, more generally, of static functions from objects to integers. The classes can be gathered in three broad groups:

LcpMonotoneMinimalPerfectHashFunction and RelativeTrieMonotoneMinimalPerfectHashFunction were introduced by Djamal Belazzougui, Paolo Boldi, Rasmus Pagh and Sebastiano Vigna in “Monotone Minimal Perfect Hashing: Searching a Sorted Table with O(1) Accesses”, Proc. of the 20th Annual ACM–SIAM Symposium On Discrete Mathematics (SODA), ACM Press, 2009. TwoStepsLcpMonotoneMinimalPerfectHashFunction, PaCoTrieMonotoneMinimalPerfectHashFunction, HollowTrie and HollowTrieMonotoneMinimalPerfectHashFunction were introduced by the same authors in “Theory and Practise of Minimal Monotone Perfect Hashing” (the class MWHCFunction implements a compacted version of the classical 3-hypergraph-based structure introduced therein).

Usage

Functions in this package implement the Object2LongFunction interface. However, the underlying machinery manipulates bit vectors only. To bring you own data into the bit vector world, each contructor requires to specify a transformation strategy that maps your objects into bit vectors. For instance, TransformationStrategies.utf16(), TransformationStrategies.prefixFreeUtf16(), TransformationStrategies.iso(), and TransformationStrategies.prefixFreeIso() are ready-made strategies that can be used with character sequences.

Note that if you plain to use monotone hashing, you must provide objects in an order such that the corresponding bit vectors are lexicographically ordered. For instance, TransformationStrategies.utf16() obtain this results by concatenating the reversed 16-bit representation of each character.

Signing functions

All functions in this package will return a value in their range for most of the keys that are not in their domain. In other words, they will produce false positives; in the few cases in which it is possible to detect a negative, you will get the default return value.

If you are interested in getting a more precise behaviour (e.g., you are migrating from the deprecated SignedMinimalPerfectHash class that was distributed with MG4J), you can sign a map, that is, you can record a signature for each key and use it to filter false positives. A signing class for character sequences is provided by the DSI utilities class ShiftAddXorSignedStringMap. By creating a function using one of the implementation provided with Sux4J and signing it using the above class, you can obtain the same functionality of the old signed classes, but you can choose the size of the signature, whether to require monotonicity, and also the space/time tradeoff of your function.