it.unimi.dsi.mg4j.search
Class ConsecutiveDocumentIterator
java.lang.Object
it.unimi.dsi.fastutil.ints.AbstractIntIterator
it.unimi.dsi.mg4j.search.AbstractDocumentIterator
it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator
it.unimi.dsi.mg4j.search.AbstractIntersectionDocumentIterator
it.unimi.dsi.mg4j.search.AbstractOrderedIntervalDocumentIterator
it.unimi.dsi.mg4j.search.ConsecutiveDocumentIterator
- All Implemented Interfaces:
- IntIterator, DocumentIterator, Iterable<Interval>, Iterator<Integer>
public class ConsecutiveDocumentIterator
- extends AbstractOrderedIntervalDocumentIterator
An iterator returning documents containing consecutive intervals (in query order)
satisfying the underlying queries.
As an additional service, this class makes it possible to specify gaps between
intervals. If gaps are specified, a match will satisfy the condition
that the left extreme of the first interval is larger than or equal to the
first gap, the left extreme of the second interval is larger than
the right extreme of the first interval plus the second gap, and so on. The standard
semantics corresponds thus to the everywhere zero gap array.
This semantics
makes it possible to peform phrasal searches “with holes”, typically
because of stopwords that have not been indexed. Note that it is possible to specify
a gap before the first interval, but not after the last interval,
as in general the document length is not known at this level of query resolution.
This class will handle correctly TRUE
iterators; in this
case, the semantics is defined as follows: an interval is in the output if it is formed by the union of disjoint intervals,
one from each input list, and each gap of value k corresponds to k iterators
returning all document positions as singleton intervals. Since TRUE
represents a list containing just
the empty interval, the result is equivalent to dropping TRUE
iterators from the input; as
a consequence, the gap of a TRUE
iterator is merged with that of the following iterator.
Warning: In case gaps are specified, the mathematically correct semantics would require that
gaps before TRUE
iterators that are not followed by any non-TRUE
iterators
have the effect of enlarging the resulting intervals on the right side. However,
this behaviour is very difficult to implement at this level because document lengths are not known. For this
reason, if one or more TRUE
iterators appear a the end of the component iterator list they will be simply dropped.
Methods inherited from interface it.unimi.dsi.fastutil.ints.IntIterator |
skip |
ConsecutiveDocumentIterator
protected ConsecutiveDocumentIterator(DocumentIterator[] documentIterator,
int[] gap)
throws IOException
- Throws:
IOException
getInstance
public static DocumentIterator getInstance(int numberOfDocuments,
DocumentIterator... documentIterator)
throws IOException
- Returns a document iterator that computes the consecutive AND of the given array of iterators.
Note that the special case of the empty and of the singleton arrays
are handled efficiently.
- Parameters:
numberOfDocuments
- the number of documents; relevant only if it
has zero length.documentIterator
- the iterators to be composed.
- Returns:
- a document iterator that computes the consecutive AND of
it
.
- Throws:
IOException
getInstance
public static DocumentIterator getInstance(DocumentIterator... documentIterator)
throws IOException
- Returns a document iterator that computes the consecutive AND of the given nonzero-length array of iterators.
Note that the special case of the singleton array is handled efficiently.
- Parameters:
documentIterator
- the iterators to be composed (at least one).
- Returns:
- a document iterator that computes the consecutive AND of
documentIterator
.
- Throws:
IOException
getInstance
public static DocumentIterator getInstance(DocumentIterator[] documentIterator,
int[] gap)
throws IOException
- Returns a document iterator that computes the consecutive AND of the given nonzero-length array of iterators, adding
gaps between intervals.
A match will satisfy the condition
that the left extreme of the first interval is larger than or equal to the
first gap, the left extreme of the second interval is larger than
the right extreme of the first interval plus the second gap, and so on. This semantics
makes it possible to perform phrasal searches “with holes”, typically
because of stopwords that have not been indexed.
- Parameters:
documentIterator
- the iterators to be composed (at least one).gap
- an array of gaps parallel to documentIterator
, or null
for no gaps.
- Returns:
- a document iterator that computes the consecutive AND of
documentIterator
using the given gaps.
- Throws:
IOException
getComposedIntervalIterator
protected IntervalIterator getComposedIntervalIterator(Index unused)
- Specified by:
getComposedIntervalIterator
in class AbstractIntersectionDocumentIterator