Class CollectionStatistics


  • public class CollectionStatistics
    extends java.lang.Object
    Contains statistics for a collection (field).

    This class holds statistics across all documents for scoring purposes:

    The following conditions are always true:

    • All statistics are positive integers: never zero or negative.
    • docCount <= maxDoc
    • docCount <= sumDocFreq <= sumTotalTermFreq

    Values may include statistics on deleted documents that have not yet been merged away.

    Be careful when performing calculations on these values because they are represented as 64-bit integer values, you may need to cast to double for your use.

    • Constructor Summary

      Constructors 
      Constructor Description
      CollectionStatistics​(java.lang.String field, long maxDoc, long docCount, long sumTotalTermFreq, long sumDocFreq)
      Creates statistics instance for a collection (field).
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      long docCount()
      The total number of documents that have at least one term for this field.
      java.lang.String field()
      The field's name.
      long maxDoc()
      The total number of documents, regardless of whether they all contain values for this field.
      long sumDocFreq()
      The total number of posting list entries for this field.
      long sumTotalTermFreq()
      The total number of tokens for this field.
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • field

        private final java.lang.String field
      • maxDoc

        private final long maxDoc
      • docCount

        private final long docCount
      • sumTotalTermFreq

        private final long sumTotalTermFreq
      • sumDocFreq

        private final long sumDocFreq
    • Constructor Detail

      • CollectionStatistics

        public CollectionStatistics​(java.lang.String field,
                                    long maxDoc,
                                    long docCount,
                                    long sumTotalTermFreq,
                                    long sumDocFreq)
        Creates statistics instance for a collection (field).
        Parameters:
        field - Field's name
        maxDoc - total number of documents.
        docCount - number of documents containing the field.
        sumTotalTermFreq - number of tokens in the field.
        sumDocFreq - number of postings list entries for the field.
        Throws:
        java.lang.IllegalArgumentException - if maxDoc is negative or zero.
        java.lang.IllegalArgumentException - if docCount is negative or zero.
        java.lang.IllegalArgumentException - if docCount is more than maxDoc.
        java.lang.IllegalArgumentException - if sumDocFreq is less than docCount.
        java.lang.IllegalArgumentException - if sumTotalTermFreq is less than sumDocFreq.
    • Method Detail

      • field

        public final java.lang.String field()
        The field's name.

        This value is never null.

        Returns:
        field's name, not null
      • maxDoc

        public final long maxDoc()
        The total number of documents, regardless of whether they all contain values for this field.

        This value is always a positive number.

        Returns:
        total number of documents, in the range [1 .. Long.MAX_VALUE]
        See Also:
        IndexReader.maxDoc()
      • docCount

        public final long docCount()
        The total number of documents that have at least one term for this field.

        This value is always a positive number, and never exceeds maxDoc().

        Returns:
        total number of documents containing this field, in the range [1 .. maxDoc()]
        See Also:
        Terms.getDocCount()
      • sumTotalTermFreq

        public final long sumTotalTermFreq()
        The total number of tokens for this field. This is the "word count" for this field across all documents. It is the sum of TermStatistics.totalTermFreq() across all terms. It is also the sum of each document's field length across all documents.

        This value is always a positive number, and always at least sumDocFreq().

        Returns:
        total number of tokens in the field, in the range [sumDocFreq() .. Long.MAX_VALUE]
        See Also:
        Terms.getSumTotalTermFreq()
      • sumDocFreq

        public final long sumDocFreq()
        The total number of posting list entries for this field. This is the sum of term-document pairs: the sum of TermStatistics.docFreq() across all terms. It is also the sum of each document's unique term count for this field across all documents.

        This value is always a positive number, always at least docCount(), and never exceeds sumTotalTermFreq().

        Returns:
        number of posting list entries, in the range [docCount() .. sumTotalTermFreq()]
        See Also:
        Terms.getSumDocFreq()
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object