it.unimi.dsi.mg4j.io
Class ByteArrayPostingList

java.lang.Object
  extended by it.unimi.dsi.mg4j.io.ByteArrayPostingList
All Implemented Interfaces:
Flushable

public class ByteArrayPostingList
extends Object
implements Flushable

Lightweight posting accumulator with format similar to that generated by BitStreamIndexWriter.

This class is essentially a dirty trick: it borrows some code and precomputed tables from OutputBitStream and exposes two simple methods (setDocumentPointer(int) and addPosition(int)) with obvious semantics. The resulting posting list is compressed exactly like an BitStreamIndexWriter would do (also in this case, duplicating some logic found therein). As a result, after completing the calls and after a call to flush() the internal buffer can be written directly to a bit stream to build an index (but see stripPointers(OutputBitStream, long)).

Scan uses an instance of this class for each indexed term. Instances can be differential, in which case they assume setDocumentPointer(int) will be called with increasing values and store gaps rather than document pointers.

Since:
1.2
Author:
Sebastiano Vigna

Field Summary
 byte[] buffer
          The internal buffer.
 int frequency
          The current frequency (number of calls to setDocumentPointer(int)).
 long globCount
          The current global count.
 int maxCount
          The maximum count ever seen.
 boolean outOfMemoryError
          If true, this list experienced an OutOfMemoryError during some buffer reallocation.
 
Constructor Summary
ByteArrayPostingList(byte[] a, boolean differential)
          Creates a new posting list wrapping a given byte array.
 
Method Summary
 void addPosition(int pos)
          Adds a new position for the current document pointer.
 int align()
          Flushes the internal bit buffer to the byte buffer.
 void flush()
          Flushes the positions cached internally.
 void setDocumentPointer(int pointer)
          Sets the current document pointer.
 void stripPointers(OutputBitStream obs, long bitLength)
          Writes the given number of bits of the internal buffer to the provided output bit stream, stripping all document pointers.
 long writtenBits()
          Returns the number of bits written by this posting list.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

buffer

public byte[] buffer
The internal buffer.


frequency

public int frequency
The current frequency (number of calls to setDocumentPointer(int)).


globCount

public long globCount
The current global count.


maxCount

public int maxCount
The maximum count ever seen.


outOfMemoryError

public boolean outOfMemoryError
If true, this list experienced an OutOfMemoryError during some buffer reallocation.

Constructor Detail

ByteArrayPostingList

public ByteArrayPostingList(byte[] a,
                            boolean differential)
Creates a new posting list wrapping a given byte array.

Parameters:
a - the byte array to wrap.
differential - whether this stream should be differential (e.g., whether it should store document pointers as gaps).
Method Detail

align

public int align()
Flushes the internal bit buffer to the byte buffer.

Returns:
the number of bits written.

flush

public void flush()
Flushes the positions cached internally.

Specified by:
flush in interface Flushable

setDocumentPointer

public void setDocumentPointer(int pointer)
Sets the current document pointer.

If the document pointer is changed since the last call, the positions currently stored are flushed and the new pointer is written to the stream.

Parameters:
pointer - a document pointer.

addPosition

public void addPosition(int pos)
Adds a new position for the current document pointer.

It is mandatory that successive calls to this method for the same document pointer have increasing arguments.

Parameters:
pos - a position.

writtenBits

public long writtenBits()
Returns the number of bits written by this posting list.

Returns:
the number of bits written by this posting list.

stripPointers

public void stripPointers(OutputBitStream obs,
                          long bitLength)
                   throws IOException
Writes the given number of bits of the internal buffer to the provided output bit stream, stripping all document pointers.

This method is a horrible kluge solving the problem of terms appearing in all documents: BitStreamIndexWriter would not write pointers in this case, but we do not know whether we will need pointers or not while we are filling the internal buffer. Thus, for those (hopefully few) termas appearing in all documents this method can be used to dump the internal buffer stripping all pointers.

Note that the valid number of bits should be retrieved using writtenBits() after a flush(). Then, a call to align() will dump to the buffer the bits still floating in the bit buffer; at that point this method can be called safely.

Parameters:
obs - an output bit stream.
bitLength - the number of bits to be scanned.
Throws:
IOException