it.unimi.dsi.mg4j.search
Class ConsecutiveDocumentIterator

java.lang.Object
  extended by it.unimi.dsi.fastutil.ints.AbstractIntIterator
      extended by it.unimi.dsi.mg4j.search.AbstractDocumentIterator
          extended by it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator
              extended by it.unimi.dsi.mg4j.search.AbstractIntersectionDocumentIterator
                  extended by it.unimi.dsi.mg4j.search.AbstractOrderedIntervalDocumentIterator
                      extended by it.unimi.dsi.mg4j.search.ConsecutiveDocumentIterator
All Implemented Interfaces:
IntIterator, DocumentIterator, Iterable<Interval>, Iterator<Integer>

public class ConsecutiveDocumentIterator
extends AbstractOrderedIntervalDocumentIterator

An iterator returning documents containing consecutive intervals (in query order) satisfying the underlying queries.

As an additional service, this class makes it possible to specify gaps between intervals. If gaps are specified, a match will satisfy the condition that the left extreme of the first interval is larger than or equal to the first gap, the left extreme of the second interval is larger than the right extreme of the first interval plus the second gap, and so on. The standard semantics corresponds thus to the everywhere zero gap array.

This semantics makes it possible to peform phrasal searches “with holes”, typically because of stopwords that have not been indexed. Note that it is possible to specify a gap before the first interval, but not after the last interval, as in general the document length is not known at this level of query resolution.

This class will handle correctly TRUE iterators; in this case, the semantics is defined as follows: an interval is in the output if it is formed by the union of disjoint intervals, one from each input list, and each gap of value k corresponds to k iterators returning all document positions as singleton intervals. Since TRUE represents a list containing just the empty interval, the result is equivalent to dropping TRUE iterators from the input; as a consequence, the gap of a TRUE iterator is merged with that of the following iterator.

Warning: In case gaps are specified, the mathematically correct semantics would require that gaps before TRUE iterators that are not followed by any non-TRUE iterators have the effect of enlarging the resulting intervals on the right side. However, this behaviour is very difficult to implement at this level because document lengths are not known. For this reason, if one or more TRUE iterators appear a the end of the component iterator list they will be simply dropped.


Nested Class Summary
 
Nested classes/interfaces inherited from class it.unimi.dsi.mg4j.search.AbstractOrderedIntervalDocumentIterator
AbstractOrderedIntervalDocumentIterator.AbstractOrderedIndexIntervalIterator, AbstractOrderedIntervalDocumentIterator.AbstractOrderedIntervalIterator
 
Nested classes/interfaces inherited from class it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator
AbstractCompositeDocumentIterator.AbstractCompositeIndexIntervalIterator, AbstractCompositeDocumentIterator.AbstractCompositeIntervalIterator
 
Nested classes/interfaces inherited from class it.unimi.dsi.mg4j.search.AbstractDocumentIterator
AbstractDocumentIterator.AbstractIntervalIterator
 
Field Summary
 
Fields inherited from class it.unimi.dsi.mg4j.search.AbstractOrderedIntervalDocumentIterator
ASSERTS, DEBUG
 
Fields inherited from class it.unimi.dsi.mg4j.search.AbstractIntersectionDocumentIterator
currentIterators, intervalIterators, unmodifiableCurrentIterators
 
Fields inherited from class it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator
documentIterator, indexIterator, indices, n, soleIndex
 
Fields inherited from class it.unimi.dsi.mg4j.search.AbstractDocumentIterator
last, next, weight
 
Constructor Summary
protected ConsecutiveDocumentIterator(DocumentIterator[] documentIterator, int[] gap)
           
 
Method Summary
protected  IntervalIterator getComposedIntervalIterator(Index unused)
           
static DocumentIterator getInstance(DocumentIterator... documentIterator)
          Returns a document iterator that computes the consecutive AND of the given nonzero-length array of iterators.
static DocumentIterator getInstance(DocumentIterator[] documentIterator, int[] gap)
          Returns a document iterator that computes the consecutive AND of the given nonzero-length array of iterators, adding gaps between intervals.
static DocumentIterator getInstance(Index index, DocumentIterator... documentIterator)
          Returns a document iterator that computes the consecutive AND of the given array of iterators.
 
Methods inherited from class it.unimi.dsi.mg4j.search.AbstractOrderedIntervalDocumentIterator
nextDocument, skipTo
 
Methods inherited from class it.unimi.dsi.mg4j.search.AbstractIntersectionDocumentIterator
intervalIterator, intervalIterators
 
Methods inherited from class it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator
accept, acceptOnTruePaths, dispose, indices, intervalIterator, toString
 
Methods inherited from class it.unimi.dsi.mg4j.search.AbstractDocumentIterator
document, hasNext, iterator, nextInt, weight, weight
 
Methods inherited from class it.unimi.dsi.fastutil.ints.AbstractIntIterator
next, remove, skip
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface it.unimi.dsi.mg4j.search.DocumentIterator
document, iterator, nextInt, weight, weight
 
Methods inherited from interface it.unimi.dsi.fastutil.ints.IntIterator
skip
 
Methods inherited from interface java.util.Iterator
hasNext, next, remove
 

Constructor Detail

ConsecutiveDocumentIterator

protected ConsecutiveDocumentIterator(DocumentIterator[] documentIterator,
                                      int[] gap)
                               throws IOException
Throws:
IOException
Method Detail

getInstance

public static DocumentIterator getInstance(Index index,
                                           DocumentIterator... documentIterator)
                                    throws IOException
Returns a document iterator that computes the consecutive AND of the given array of iterators.

Note that the special case of the empty and of the singleton arrays are handled efficiently.

Parameters:
index - the default index; relevant only if it has zero length.
documentIterator - the iterators to be composed.
Returns:
a document iterator that computes the consecutive AND of it.
Throws:
IOException

getInstance

public static DocumentIterator getInstance(DocumentIterator... documentIterator)
                                    throws IOException
Returns a document iterator that computes the consecutive AND of the given nonzero-length array of iterators.

Note that the special case of the singleton array is handled efficiently.

Parameters:
documentIterator - the iterators to be composed (at least one).
Returns:
a document iterator that computes the consecutive AND of documentIterator.
Throws:
IOException

getInstance

public static DocumentIterator getInstance(DocumentIterator[] documentIterator,
                                           int[] gap)
                                    throws IOException
Returns a document iterator that computes the consecutive AND of the given nonzero-length array of iterators, adding gaps between intervals.

A match will satisfy the condition that the left extreme of the first interval is larger than or equal to the first gap, the left extreme of the second interval is larger than the right extreme of the first interval plus the second gap, and so on. This semantics makes it possible to perform phrasal searches “with holes”, typically because of stopwords that have not been indexed.

Parameters:
documentIterator - the iterators to be composed (at least one).
gap - an array of gaps parallel to documentIterator, or null for no gaps.
Returns:
a document iterator that computes the consecutive AND of documentIterator using the given gaps.
Throws:
IOException

getComposedIntervalIterator

protected IntervalIterator getComposedIntervalIterator(Index unused)
Specified by:
getComposedIntervalIterator in class AbstractIntersectionDocumentIterator