it.unimi.dsi.mg4j.tool
Class SecondPass
java.lang.Object
it.unimi.dsi.mg4j.tool.SecondPass
- All Implemented Interfaces:
- CompressionFlags
- public final class SecondPass
- extends Object
- implements CompressionFlags
Builds an inverted index by merging occurrence batches produced by FirstPass
.
Some statistics will be printed to standard output at the end of the indexing process.
This class merges the occurrence files produced by FirstPass
(and
possibly reordered by MiddlePass
) into an inverted index.
These are the files currently generated:
- basename.index
- The inverted index.
- basename.offsets
- For each term, the byte offset in
basename.index at which the inverted lists start.
More precisely, the first integer is the offset for term 0 in γ coding, and then the
i-th integer is the difference between the i-th and
the i−1-th offset in γ coding. If T
terms were indexed, this file will contain T+1 integers,
the last being the difference (in bytes) between the length of the entire
inverted index and the offset of the last inverted list.
- basename.globcounts
- For each term, the number of its occurrences
throughout the whole document collection, in γ coding. More
precisely, the i-th integer of the file (starting from 0) is the
number of occurrences of the term of index i.
- basename.properties
- This class adds some information to the
property file
produced
by FirstPass
. Currently, the following keys are generated:
- compressionflags
- the mask of compression flags used when generating the index;
- maxcount
- the maximum count in the collection, that is, the maximum count of
a term maximised on all terms and documents.
- Since:
- 0.6
- Author:
- Sebastiano Vigna
Fields inherited from interface it.unimi.dsi.mg4j.index.CompressionFlags |
ARITH, CODING_NAME, COUNTS_DEFAULT, COUNTS_DELTA, COUNTS_GAMMA, COUNTS_SHIFT, DELTA, FREQUENCIES_DEFAULT, FREQUENCIES_DELTA, FREQUENCIES_GAMMA, FREQUENCIES_SHIFT, GAMMA, GOLOMB, INTERP, NIBBLE, NO_COUNTS, NO_POSITIONS, NONE, POINTERS_DEFAULT, POINTERS_DELTA, POINTERS_GAMMA, POINTERS_GOLOMB, POINTERS_SHIFT, POSITIONS_ARITH, POSITIONS_DEFAULT, POSITIONS_DELTA, POSITIONS_GAMMA, POSITIONS_GOLOMB, POSITIONS_INTERP, POSITIONS_SHIFT, POSITIONS_SKEWED_GOLOMB, SKEWED_GOLOMB, UNARY, ZETA |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
main
public static void main(String[] arg)