Package xappy :: Module searchconnection :: Class SearchConnection
[frames] | no frames]

Class SearchConnection

source code

object --+
         |
        SearchConnection

A connection to the search engine for searching.

The connection will access a view of the database.



Nested Classes
  ExpandDecider
Instance Methods
 
__init__(self, indexpath)
Create a new connection to the index for searching.
source code
 
__del__(self) source code
 
append_close_handler(self, handler, userdata=None)
Append a callback to the list of close handlers.
source code
 
reopen(self)
Reopen the connection.
source code
 
close(self)
Close the connection to the database.
source code
 
get_doccount(self)
Count the number of documents in the database.
source code
 
query_composite(self, operator, queries)
Build a composite query from a list of queries.
source code
 
query_multweight(self, query, multiplier)
Build a query which modifies the weights of a subquery.
source code
 
query_filter(self, query, filter, exclude=False)
Filter a query with another query.
source code
 
query_adjust(self, primary, secondary)
Adjust the weights of one query with a secondary query.
source code
 
query_range(self, field, begin, end)
Create a query for a range search.
source code
 
query_facet(self, field, val)
Create a query for a facet value.
source code
 
query_parse(self, string, allow=None, deny=None, default_op=0, default_allow=None, default_deny=None)
Parse a query string.
source code
 
query_field(self, field, value, default_op=0)
A query for a single field.
source code
 
query_similar(self, ids, allow=None, deny=None, simterms=10)
Get a query which returns documents which are similar to others.
source code
 
significant_terms(self, ids, maxterms=10, allow=None, deny=None)
Get a set of "significant" terms for a document, or documents.
source code
 
query_all(self)
A query which matches all the documents in the database.
source code
 
query_none(self)
A query which matches no documents in the database.
source code
 
spell_correct(self, querystr, allow=None, deny=None, default_op=0, default_allow=None, default_deny=None)
Correct a query spelling.
source code
 
can_collapse_on(self, field)
Check if this database supports collapsing on a specified field.
source code
 
can_sort_on(self, field)
Check if this database supports sorting on a specified field.
source code
 
search(self, query, startrank, endrank, checkatleast=0, sortby=None, collapse=None, gettags=None, getfacets=None, allowfacets=None, denyfacets=None, usesubfacets=None, percentcutoff=None, weightcutoff=None, query_type=None)
Perform a search, for documents matching a query.
source code
 
iterids(self)
Get an iterator which returns all the ids in the database.
source code
 
get_document(self, id)
Get the document with the specified unique ID.
source code
 
iter_synonyms(self, prefix='')
Get an iterator over the synonyms.
source code
 
get_metadata(self, key)
Get an item of metadata stored in the connection.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Variables
  OP_AND = 0
  OP_OR = 1
Properties

Inherited from object: __class__

Method Details

__init__(self, indexpath)
(Constructor)

source code 

Create a new connection to the index for searching.

There may only an arbitrary number of search connections for a particular database open at a given time (regardless of whether there is a connection for indexing open as well).

If the database doesn't exist, an exception will be raised.

Overrides: object.__init__

append_close_handler(self, handler, userdata=None)

source code 

Append a callback to the list of close handlers.

These will be called when the SearchConnection is closed. This happens when the close() method is called, or when the SearchConnection object is deleted. The callback will be passed two arguments: the path to the SearchConnection object, and the userdata supplied to this method.

The handlers will be called in the order in which they were added.

The handlers will be called after the connection has been closed, so cannot prevent it closing: their return value will be ignored. In addition, they should not raise any exceptions.

reopen(self)

source code 

Reopen the connection.

This updates the revision of the index which the connection references to the latest flushed revision.

close(self)

source code 

Close the connection to the database.

It is important to call this method before allowing the class to be garbage collected to ensure that the connection is cleaned up promptly.

No other methods may be called on the connection after this has been called. (It is permissible to call close() multiple times, but only the first call will have any effect.)

If an exception occurs, the database will be closed, but changes since the last call to flush may be lost.

get_doccount(self)

source code 

Count the number of documents in the database.

This count will include documents which have been added or removed but not yet flushed().

query_composite(self, operator, queries)

source code 

Build a composite query from a list of queries.

The queries are combined with the supplied operator, which is either SearchConnection.OP_AND or SearchConnection.OP_OR.

query_multweight(self, query, multiplier)

source code 

Build a query which modifies the weights of a subquery.

This produces a query which returns the same documents as the subquery, and in the same order, but with the weights assigned to each document multiplied by the value of "multiplier". "multiplier" may be any floating point value, but negative values will be clipped to 0, since Xapian doesn't support negative weights.

This can be useful when producing queries to be combined with query_composite, because it allows the relative importance of parts of the query to be adjusted.

query_filter(self, query, filter, exclude=False)

source code 

Filter a query with another query.

If exclude is False (or not specified), documents will only match the resulting query if they match the both the first and second query: the results of the first query are "filtered" to only include those which also match the second query.

If exclude is True, documents will only match the resulting query if they match the first query, but not the second query: the results of the first query are "filtered" to only include those which do not match the second query.

Documents will always be weighted according to only the first query.

  • query: The query to filter.
  • filter: The filter to apply to the query.
  • exclude: If True, the sense of the filter is reversed - only documents which do not match the second query will be returned.

query_adjust(self, primary, secondary)

source code 

Adjust the weights of one query with a secondary query.

Documents will be returned from the resulting query if and only if they match the primary query (specified by the "primary" parameter). However, the weights (and hence, the relevance rankings) of the documents will be adjusted by adding weights from the secondary query (specified by the "secondary" parameter).

query_range(self, field, begin, end)

source code 

Create a query for a range search.

This creates a query which matches only those documents which have a field value in the specified range.

Begin and end must be appropriate values for the field, according to the 'type' parameter supplied to the SORTABLE action for the field.

The begin and end values are both inclusive - any documents with a value equal to begin or end will be returned (unless end is less than begin, in which case no documents will be returned).

Begin or end may be set to None in order to create an open-ended range. (They may also both be set to None, which will generate a query which matches all documents containing any value for the field.)

query_facet(self, field, val)

source code 

Create a query for a facet value.

This creates a query which matches only those documents which have a facet value in the specified range.

For a numeric range facet, val should be a tuple holding the start and end of the range, or a comma separated string holding two floating point values. For other facets, val should be the value to look for.

The start and end values are both inclusive - any documents with a value equal to start or end will be returned (unless end is less than start, in which case no documents will be returned).

query_parse(self, string, allow=None, deny=None, default_op=0, default_allow=None, default_deny=None)

source code 

Parse a query string.

This is intended for parsing queries entered by a user. If you wish to combine structured queries, it is generally better to use the other query building methods, such as query_composite (though you may wish to create parts of the query to combine with such methods with this method).

The string passed to this method can have various operators in it. In particular, it may contain field specifiers (ie, field names, followed by a colon, followed by some text to search for in that field). For example, if "author" is a field in the database, the search string could contain "author:richard", and this would be interpreted as "search for richard in the author field". By default, any fields in the database which are indexed with INDEX_EXACT or INDEX_FREETEXT will be available for field specific searching in this way - however, this can be modified using the "allow" or "deny" parameters, and also by the allow_field_specific tag on INDEX_FREETEXT fields.

Any text which isn't prefixed by a field specifier is used to search the "default set" of fields. By default, this is the full set of fields in the database which are indexed with INDEX_FREETEXT and for which the search_by_default flag set (ie, if the text is found in any of those fields, the query will match). However, this may be modified with the "default_allow" and "default_deny" parameters. (Note that fields which are indexed with INDEX_EXACT aren't allowed to be used in the default list of fields.)

  • string: The string to parse.
  • allow: A list of fields to allow in the query.
  • deny: A list of fields not to allow in the query.
  • default_op: The default operator to combine query terms with.
  • default_allow: A list of fields to search for by default.
  • default_deny: A list of fields not to search for by default.

Only one of allow and deny may be specified.

Only one of default_allow and default_deny may be specified.

If any of the entries in allow are not present in the configuration for the database, or are not specified for indexing (either as INDEX_EXACT or INDEX_FREETEXT), they will be ignored. If any of the entries in deny are not present in the configuration for the database, they will be ignored.

Returns a Query object, which may be passed to the search() method, or combined with other queries.

query_similar(self, ids, allow=None, deny=None, simterms=10)

source code 

Get a query which returns documents which are similar to others.

The list of document IDs to base the similarity search on is given in ids. This should be an iterable, holding a list of strings. If any of the supplied IDs cannot be found in the database, they will be ignored. (If no IDs can be found in the database, the resulting query will not match any documents.)

By default, all fields which have been indexed for freetext searching will be used for the similarity calculation. The list of fields used for this can be customised using the allow and deny parameters (only one of which may be specified):

  • allow: A list of fields to base the similarity calculation on.
  • deny: A list of fields not to base the similarity calculation on.
  • simterms: Number of terms to use for the similarity calculation.

For convenience, any of ids, allow, or deny may be strings, which will be treated the same as a list of length 1.

Regardless of the setting of allow and deny, only fields which have been indexed for freetext searching will be used for the similarity measure - all other fields will always be ignored for this purpose.

significant_terms(self, ids, maxterms=10, allow=None, deny=None)

source code 

Get a set of "significant" terms for a document, or documents.

This has a similar interface to query_similar(): it takes a list of ids, and an optional specification of a set of fields to consider. Instead of returning a query, it returns a list of terms from the document (or documents), which appear "significant". Roughly, in this situation significant means that the terms occur more frequently in the specified document than in the rest of the corpus.

The list is in decreasing order of "significance".

By default, all terms related to fields which have been indexed for freetext searching will be considered for the list of significant terms. The list of fields used for this can be customised using the allow and deny parameters (only one of which may be specified):

  • allow: A list of fields to consider.
  • deny: A list of fields not to consider.

For convenience, any of ids, allow, or deny may be strings, which will be treated the same as a list of length 1.

Regardless of the setting of allow and deny, only fields which have been indexed for freetext searching will be considered - all other fields will always be ignored for this purpose.

The maximum number of terms to return may be specified by the maxterms parameter.

query_none(self)

source code 

A query which matches no documents in the database.

This may be useful as a placeholder in various situations.

spell_correct(self, querystr, allow=None, deny=None, default_op=0, default_allow=None, default_deny=None)

source code 

Correct a query spelling.

This returns a version of the query string with any misspelt words corrected.

  • allow: A list of fields to allow in the query.
  • deny: A list of fields not to allow in the query.
  • default_op: The default operator to combine query terms with.
  • default_allow: A list of fields to search for by default.
  • default_deny: A list of fields not to search for by default.

Only one of allow and deny may be specified.

Only one of default_allow and default_deny may be specified.

If any of the entries in allow are not present in the configuration for the database, or are not specified for indexing (either as INDEX_EXACT or INDEX_FREETEXT), they will be ignored. If any of the entries in deny are not present in the configuration for the database, they will be ignored.

Note that it is possible that the resulting spell-corrected query will still match no documents - the user should usually check that some documents are matched by the corrected query before suggesting it to users.

search(self, query, startrank, endrank, checkatleast=0, sortby=None, collapse=None, gettags=None, getfacets=None, allowfacets=None, denyfacets=None, usesubfacets=None, percentcutoff=None, weightcutoff=None, query_type=None)

source code 

Perform a search, for documents matching a query.

  • query is the query to perform.
  • startrank is the rank of the start of the range of matching documents to return (ie, the result with this rank will be returned). ranks start at 0, which represents the "best" matching document.
  • endrank is the rank at the end of the range of matching documents to return. This is exclusive, so the result with this rank will not be returned.
  • checkatleast is the minimum number of results to check for: the estimate of the total number of matches will always be exact if the number of matches is less than checkatleast. A value of -1 can be specified for the checkatleast parameter - this has the special meaning of "check all matches", and is equivalent to passing the result of get_doccount().
  • sortby is the name of a field to sort by. It may be preceded by a '+' or a '-' to indicate ascending or descending order (respectively). If the first character is neither '+' or '-', the sort will be in ascending order.
  • collapse is the name of a field to collapse the result documents on. If this is specified, there will be at most one result in the result set for each value of the field.
  • gettags is the name of a field to count tag occurrences in, or a list of fields to do so.
  • getfacets is a boolean - if True, the matching documents will be examined to build up a list of the facet values contained in them.
  • allowfacets is a list of the fieldnames of facets to consider.
  • denyfacets is a list of fieldnames of facets which will not be considered.
  • usesubfacets is a boolean - if True, only top-level facets and subfacets of facets appearing in the query are considered (taking precedence over allowfacets and denyfacets).
  • percentcutoff is the minimum percentage a result must have to be returned.
  • weightcutoff is the minimum weight a result must have to be returned.
  • query_type is a value indicating the type of query being performed. If not None, the value is used to influence which facets are be returned by the get_suggested_facets() function. If the value of getfacets is False, it has no effect.

If neither 'allowfacets' or 'denyfacets' is specified, all fields holding facets will be considered (but see 'usesubfacets').

iterids(self)

source code 

Get an iterator which returns all the ids in the database.

The unqiue_ids are currently returned in binary lexicographical sort order, but this should not be relied on.

Note that the iterator returned by this method may raise a xapian.DatabaseModifiedError exception if modifications are committed to the database while the iteration is in progress. If this happens, the search connection must be reopened (by calling reopen) and the iteration restarted.

get_document(self, id)

source code 

Get the document with the specified unique ID.

Raises a KeyError if there is no such document. Otherwise, it returns a ProcessedDocument.

iter_synonyms(self, prefix='')

source code 

Get an iterator over the synonyms.

  • prefix: if specified, only synonym keys with this prefix will be returned.

The iterator returns 2-tuples, in which the first item is the key (ie, a 2-tuple holding the term or terms which will be synonym expanded, followed by the fieldname specified (or None if no fieldname)), and the second item is a tuple of strings holding the synonyms for the first item.

These return values are suitable for the dict() builtin, so you can write things like:

>>> conn = _indexerconnection.IndexerConnection('foo')
>>> conn.add_synonym('foo', 'bar')
>>> conn.add_synonym('foo bar', 'baz')
>>> conn.add_synonym('foo bar', 'foo baz')
>>> conn.flush()
>>> conn = SearchConnection('foo')
>>> dict(conn.iter_synonyms())
{('foo', None): ('bar',), ('foo bar', None): ('baz', 'foo baz')}

get_metadata(self, key)

source code 

Get an item of metadata stored in the connection.

This returns a value stored by a previous call to IndexerConnection.set_metadata.

If the value is not found, this will return the empty string.