sorting module

Base types

class whoosh.sorting.FacetType

Base class for “facets”, aspects that can be sorted/faceted.

categorizer(searcher)

Returns a Categorizer corresponding to this facet.

class whoosh.sorting.Categorizer

Base class for categorizer objects which compute a key value for a document based on certain criteria, for use in sorting/faceting.

Categorizers are created by FacetType objects through the FacetType.categorizer() method. The whoosh.searching.Searcher object passed to the categorizer method may be a composite searcher (that is, wrapping a multi-reader), but categorizers are always run per-segment, with segment-relative document numbers.

The collector will call a categorizer’s set_searcher method as it searches each segment to let the cateogorizer set up whatever segment- specific data it needs.

Collector.allow_overlap should be True if the caller should use the keys_for_id method instead of key_for_id to group documents into potentially overlapping groups.

key_for_id(docid)

Returns a key for the given segment-relative document number.

key_for_matcher(matcher)

Returns a key for the given matcher. The default implementation simply gets the matcher’s current document ID and calls key_for_id, but a subclass can override this if it needs information from the matcher to compute the key.

key_to_name(key)

Returns a representation of the key to be used as a dictionary key in faceting. For example, the sorting key for date fields is a large integer; this method translates it into a datetime object to make the groupings clearer.

keys_for_id(docid)

Yields a series of keys for the given segment-relative document number. This method will be called instead of key_for_id if Categorizer.allow_overlap==True.

set_searcher(searcher, docoffset)

Called by the collector when the collector moves to a new segment. The searcher will be atomic. The docoffset is the offset of the segment’s document numbers relative to the entire index. You can use the offset to get absolute index docnums by adding the offset to segment-relative docnums.

Facet types

class whoosh.sorting.FieldFacet(fieldname, reverse=False, allow_overlap=False)

Sorts/facest by the contents of a field.

For example, to sort by the contents of the “path” field in reverse order, and facet by the contents of the “tag” field:

paths = FieldFacet("path", reverse=True)
tags = FieldFacet("tag")
results = searcher.search(myquery, sortedby=paths, groupedby=tags)

This facet returns different categorizers based on the field type.

Parameters:
  • fieldname – the name of the field to sort/facet on.
  • reverse – if True, when sorting, reverse the sort order of this facet.
  • allow_overlap – if True, when grouping, allow documents to appear in multiple groups when they have multiple terms in the field.
class whoosh.sorting.QueryFacet(querydict, other=None, allow_overlap=False)

Sorts/facets based on the results of a series of queries.

Parameters:
  • querydict – a dictionary mapping keys to whoosh.query.Query objects.
  • other – the key to use for documents that don’t match any of the queries.
class whoosh.sorting.RangeFacet(fieldname, start, end, gap, hardend=False)

Sorts/facets based on numeric ranges. For textual ranges, use QueryFacet.

For example, to facet the “price” field into $100 buckets, up to $1000:

prices = RangeFacet("price", 0, 1000, 100)
results = searcher.search(myquery, groupedby=prices)

The ranges/buckets are always inclusive at the start and exclusive at the end.

Parameters:
  • fieldname – the numeric field to sort/facet on.
  • start – the start of the entire range.
  • end – the end of the entire range.
  • gap – the size of each “bucket” in the range. This can be a sequence of sizes. For example, gap=[1,5,10] will use 1 as the size of the first bucket, 5 as the size of the second bucket, and 10 as the size of all subsequent buckets.
  • hardend – if True, the end of the last bucket is clamped to the value of end. If False (the default), the last bucket is always gap sized, even if that means the end of the last bucket is after end.
class whoosh.sorting.DateRangeFacet(fieldname, startdate, enddate, delta, hardend=False)

Sorts/facets based on date ranges.

For example, to facet the “birthday” field into year-sized buckets:

startdate = datetime(1920, 0, 0)
enddate = datetime.now()
gap = timedelta(days=365)
bdays = RangeFacet("birthday", startdate, enddate, gap)
results = searcher.search(myquery, groupedby=bdays)

The ranges/buckets are always inclusive at the start and exclusive at the end.

Parameters:
  • fieldname – the datetime field to sort/facet on.
  • startdate – the start of the entire range.
  • enddate – the end of the entire range.
  • delta – a timedelta object representing the size of each “bucket” in the range. This can be a sequence of timedeltas. For example, gap=[timedelta(days=1), timedelta(days=5), timedelta(days=10)] will use 1 day as the size of the first bucket, 5 days as the size of the second bucket, and 10 days as the size of all subsequent buckets.
  • hardend – if True, the end of the last bucket is clamped to the value of end. If False (the default), the last bucket is always gap sized, even if that means the end of the last bucket is after end.
class whoosh.sorting.ScoreFacet

Uses a document’s score as a sorting criterion.

For example, to sort by the tag field, and then within that by relative score:

tag_score = MultiFacet(["tag", ScoreFacet()])
results = searcher.search(myquery, sortedby=tag_score)
class whoosh.sorting.FunctionFacet(fn)

Lets you pass an arbitrary function that will compute the key. This may be easier than subclassing FacetType and Categorizer to set up the desired behavior.

The function is called with the arguments (searcher, docid), where the searcher may be a composite searcher, and the docid is an absolute index document number (not segment-relative).

For example, to use the number of words in the document’s “content” field as the sorting/faceting key:

fn = lambda s, docid: s.doc_field_length(docid, "content")
lengths = FunctionFacet(fn)
class whoosh.sorting.MultiFacet(items=None)

Sorts/facets by the combination of multiple “sub-facets”.

For example, to sort by the value of the “tag” field, and then (for documents where the tag is the same) by the value of the “path” field:

facet = MultiFacet(FieldFacet("tag"), FieldFacet("path")
results = searcher.search(myquery, sortedby=facet)

As a shortcut, you can use strings to refer to field names, and they will be assumed to be field names and turned into FieldFacet objects:

facet = MultiFacet("tag", "path")

You can also use the add_* methods to add criteria to the multifacet:

facet = MultiFacet()
facet.add_field("tag")
facet.add_field("path", reverse=True)
facet.add_query({"a-m": TermRange("name", "a", "m"), "n-z": TermRange("name", "n", "z")})

Facets object

class whoosh.sorting.Facets(x=None)

Maps facet names to FacetType objects, for creating multiple groupings of documents.

For example, to group by tag, and also group by price range:

facets = Facets()
facets.add_field("tag")
facets.add_facet("price", RangeFacet("price", 0, 1000, 100))
results = searcher.search(myquery, groupedby=facets)

tag_groups = results.groups("tag")
price_groups = results.groups("price")

(To group by the combination of multiple facets, use MultiFacet.)

add_facet(name, facet)

Adds a FacetType object under the given name.

add_facets(facets, replace=True)

Adds the contents of the given Facets or dict object to this object.

add_field(fieldname, allow_overlap=False)

Adds a FieldFacet for the given field name (the field name is automatically used as the facet name).

add_query(name, querydict, other=None, allow_overlap=False)

Adds a QueryFacet under the given name.

Parameters:
  • name – a name for the facet.
  • querydict – a dictionary mapping keys to whoosh.query.Query objects.
items()

Returns a list of (facetname, facetobject) tuples for the facets in this object.

Table Of Contents

Previous topic

searching module

Next topic

spans module

This Page