Class CommonTermsQuery
- java.lang.Object
-
- org.apache.lucene.search.Query
-
- org.apache.lucene.queries.CommonTermsQuery
-
public class CommonTermsQuery extends Query
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off theadded
terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.CommonTermsQuery
has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
-
-
Field Summary
Fields Modifier and Type Field Description protected float
highFreqBoost
protected float
highFreqMinNrShouldMatch
protected BooleanClause.Occur
highFreqOccur
protected float
lowFreqBoost
protected float
lowFreqMinNrShouldMatch
protected BooleanClause.Occur
lowFreqOccur
protected float
maxTermFrequency
protected java.util.List<Term>
terms
-
Constructor Summary
Constructors Constructor Description CommonTermsQuery(BooleanClause.Occur highFreqOccur, BooleanClause.Occur lowFreqOccur, float maxTermFrequency)
Creates a newCommonTermsQuery
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(Term term)
Adds a term to theCommonTermsQuery
protected Query
buildQuery(int maxDoc, TermStates[] contextArray, Term[] queryTerms)
protected int
calcHighFreqMinimumNumberShouldMatch(int numOptional)
protected int
calcLowFreqMinimumNumberShouldMatch(int numOptional)
void
collectTermStates(IndexReader reader, java.util.List<LeafReaderContext> leaves, TermStates[] contextArray, Term[] queryTerms)
boolean
equals(java.lang.Object other)
Override and implement query instance equivalence properly in a subclass.private boolean
equalsTo(CommonTermsQuery other)
float
getHighFreqBoost()
Gets the boost used for high frequency terms.float
getHighFreqMinimumNumberShouldMatch()
Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied.BooleanClause.Occur
getHighFreqOccur()
Gets theBooleanClause.Occur
used for high frequency terms.float
getLowFreqBoost()
Gets the boost used for low frequency terms.float
getLowFreqMinimumNumberShouldMatch()
Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied.BooleanClause.Occur
getLowFreqOccur()
Gets theBooleanClause.Occur
used for low frequency terms.float
getMaxTermFrequency()
Gets the maximum threshold of a terms document frequency to be considered a low frequency term.java.util.List<Term>
getTerms()
Gets the list of terms.int
hashCode()
Override and implement query hash code properly in a subclass.private int
minNrShouldMatch(float minNrShouldMatch, int numOptional)
protected Query
newTermQuery(Term term, TermStates termStates)
Builds a new TermQuery instance.Query
rewrite(IndexReader reader)
Expert: called to re-write queries into primitive queries.void
setHighFreqMinimumNumberShouldMatch(float min)
Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part.void
setLowFreqMinimumNumberShouldMatch(float min)
Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part.java.lang.String
toString(java.lang.String field)
Prints a query to a string, withfield
assumed to be the default field and omitted.void
visit(QueryVisitor visitor)
Recurse through the query tree, visiting any child queries-
Methods inherited from class org.apache.lucene.search.Query
classHash, createWeight, sameClassAs, toString
-
-
-
-
Field Detail
-
terms
protected final java.util.List<Term> terms
-
maxTermFrequency
protected final float maxTermFrequency
-
lowFreqOccur
protected final BooleanClause.Occur lowFreqOccur
-
highFreqOccur
protected final BooleanClause.Occur highFreqOccur
-
lowFreqBoost
protected float lowFreqBoost
-
highFreqBoost
protected float highFreqBoost
-
lowFreqMinNrShouldMatch
protected float lowFreqMinNrShouldMatch
-
highFreqMinNrShouldMatch
protected float highFreqMinNrShouldMatch
-
-
Constructor Detail
-
CommonTermsQuery
public CommonTermsQuery(BooleanClause.Occur highFreqOccur, BooleanClause.Occur lowFreqOccur, float maxTermFrequency)
Creates a newCommonTermsQuery
- Parameters:
highFreqOccur
-BooleanClause.Occur
used for high frequency termslowFreqOccur
-BooleanClause.Occur
used for low frequency termsmaxTermFrequency
- a value in [0..1) (or absolute number >=1) representing the maximum threshold of a terms document frequency to be considered a low frequency term.- Throws:
java.lang.IllegalArgumentException
- ifBooleanClause.Occur.MUST_NOT
is pass as lowFreqOccur or highFreqOccur
-
-
Method Detail
-
add
public void add(Term term)
Adds a term to theCommonTermsQuery
- Parameters:
term
- the term to add
-
rewrite
public Query rewrite(IndexReader reader) throws java.io.IOException
Description copied from class:Query
Expert: called to re-write queries into primitive queries. For example, a PrefixQuery will be rewritten into a BooleanQuery that consists of TermQuerys.
-
visit
public void visit(QueryVisitor visitor)
Description copied from class:Query
Recurse through the query tree, visiting any child queries
-
calcLowFreqMinimumNumberShouldMatch
protected int calcLowFreqMinimumNumberShouldMatch(int numOptional)
-
calcHighFreqMinimumNumberShouldMatch
protected int calcHighFreqMinimumNumberShouldMatch(int numOptional)
-
minNrShouldMatch
private final int minNrShouldMatch(float minNrShouldMatch, int numOptional)
-
buildQuery
protected Query buildQuery(int maxDoc, TermStates[] contextArray, Term[] queryTerms)
-
collectTermStates
public void collectTermStates(IndexReader reader, java.util.List<LeafReaderContext> leaves, TermStates[] contextArray, Term[] queryTerms) throws java.io.IOException
- Throws:
java.io.IOException
-
setLowFreqMinimumNumberShouldMatch
public void setLowFreqMinimumNumberShouldMatch(float min)
Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
- Parameters:
min
- the number of optional clauses that must match
-
getLowFreqMinimumNumberShouldMatch
public float getLowFreqMinimumNumberShouldMatch()
Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied.
-
setHighFreqMinimumNumberShouldMatch
public void setHighFreqMinimumNumberShouldMatch(float min)
Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
- Parameters:
min
- the number of optional clauses that must match
-
getHighFreqMinimumNumberShouldMatch
public float getHighFreqMinimumNumberShouldMatch()
Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied.
-
getTerms
public java.util.List<Term> getTerms()
Gets the list of terms.
-
getMaxTermFrequency
public float getMaxTermFrequency()
Gets the maximum threshold of a terms document frequency to be considered a low frequency term.
-
getLowFreqOccur
public BooleanClause.Occur getLowFreqOccur()
Gets theBooleanClause.Occur
used for low frequency terms.
-
getHighFreqOccur
public BooleanClause.Occur getHighFreqOccur()
Gets theBooleanClause.Occur
used for high frequency terms.
-
getLowFreqBoost
public float getLowFreqBoost()
Gets the boost used for low frequency terms.
-
getHighFreqBoost
public float getHighFreqBoost()
Gets the boost used for high frequency terms.
-
toString
public java.lang.String toString(java.lang.String field)
Description copied from class:Query
Prints a query to a string, withfield
assumed to be the default field and omitted.
-
hashCode
public int hashCode()
Description copied from class:Query
Override and implement query hash code properly in a subclass. This is required so thatQueryCache
works properly.- Specified by:
hashCode
in classQuery
- See Also:
Query.equals(Object)
-
equals
public boolean equals(java.lang.Object other)
Description copied from class:Query
Override and implement query instance equivalence properly in a subclass. This is required so thatQueryCache
works properly. Typically a query will be equal to another only if it's an instance of the same class and its document-filtering properties are identical that other instance. Utility methods are provided for certain repetitive code.- Specified by:
equals
in classQuery
- See Also:
Query.sameClassAs(Object)
,Query.classHash()
-
equalsTo
private boolean equalsTo(CommonTermsQuery other)
-
newTermQuery
protected Query newTermQuery(Term term, TermStates termStates)
Builds a new TermQuery instance.This is intended for subclasses that wish to customize the generated queries.
- Parameters:
term
- termtermStates
- the TermStates to be used to create the low level term query. Can benull
.- Returns:
- new TermQuery instance
-
-