it.unimi.dsi.mg4j.util
Class SemiExternalOffsetList
java.lang.Object
it.unimi.dsi.fastutil.longs.AbstractLongCollection
it.unimi.dsi.fastutil.longs.AbstractLongList
it.unimi.dsi.mg4j.util.SemiExternalOffsetList
- All Implemented Interfaces:
- LongCollection, LongIterable, LongList, LongStack, Stack<Long>, Comparable<List<? extends Long>>, Iterable<Long>, Collection<Long>, List<Long>
public class SemiExternalOffsetList
- extends AbstractLongList
Provides semi-external random access to offsets of an index
.
This class is a semi-external LongList
that
MG4J uses as default for accessing term offsets.
When the number of terms in the index grows, storing each offset as a long in an
array can consume hundred of megabytes of memory, and most of this memory is wasted,
as it is occupied by offsets of hapax legomena (terms occurring just once in the
collection). Instead, this class accesses offsets in their
compressed forms, and provides entry points for random access to each offset. At construction
time, entry points are computed with a certain step, which is the number of offsets
accessible from each entry point, or, equivalently, the maximum number of offsets that will
be necessary to read to access a given offset.
Warning: This class is not thread safe, and needs to be synchronised to be used in a
multithreaded environment.
- Author:
- Fabien Campagne, Sebastiano Vigna
Methods inherited from class it.unimi.dsi.fastutil.longs.AbstractLongList |
add, add, add, addAll, addAll, addAll, addAll, addAll, addAll, addElements, addElements, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, get, getElements, hashCode, indexOf, indexOf, iterator, lastIndexOf, lastIndexOf, listIterator, listIterator, longListIterator, longListIterator, longSubList, peek, peekLong, pop, popLong, push, push, rem, remove, remove, removeElements, removeLong, set, set, size, subList, top, topLong, toString |
Methods inherited from class it.unimi.dsi.fastutil.longs.AbstractLongCollection |
add, clear, contains, containsAll, containsAll, isEmpty, longIterator, rem, removeAll, removeAll, retainAll, retainAll, toArray, toArray, toArray, toLongArray, toLongArray |
Methods inherited from interface it.unimi.dsi.fastutil.Stack |
isEmpty |
SemiExternalOffsetList
public SemiExternalOffsetList(InputBitStream offsetRawData,
int offsetStep,
int numOffsets)
throws IOException
- Creates a new semi-external list.
- Parameters:
offsetRawData
- a bit stream containing the offsets in compressed form (γ-encoded deltas).offsetStep
- the step used to build random-access entry points.numOffsets
- the overall number of offsets (i.e., the number of terms).
- Throws:
IOException
getLong
public final long getLong(int index)
size
public int size()