This module implements an FST/FSA writer and reader. An FST (Finite State Transducer) stores a directed acyclic graph with values associated with the leaves. Common elements of the values are pushed inside the tree. An FST that does not store values is a regular FSA.
The format of the leaf values is pluggable using subclasses of the Values class.
Whoosh uses these structures to store a directed acyclic word graph (DAWG) for use in (at least) spell checking.
A slow but easier-to-use wrapper for FSA/DAWGs. Translates the low-level arc-based interface of GraphReader into Node objects with methods to follow edges.
Makes two graphs appear to be the union of the two graphs.
Makes two graphs appear to be the intersection of the two graphs.
Base class for cursor objects.
Returns True if the current arc leads to an accept state (the end of a valid key).
Returns True if the current arc is the last outgoing arc from the previous node.
Follows the labels in the given path, starting at the current position.
Yields the keys in the graph, starting at the current position.
Yields (key, value) tuples in an FST, starting at the current position.
Follows the current arc.
Returns True if this cursor is still active, that is it has not read past the last arc in the graph.
Returns the label bytes of the current arc.
Moves to the next outgoing arc from the previous node.
Returns a sequence of label bytes representing the next closest key in the graph.
Returns the next closest key in the graph as a single bytes object.
Returns the next closest key in the graph as a decoded unicode string.
Returns a sequence of the label bytes for the path from the root to the current arc.
Returns the label bytes for the path from the root to the current arc as a single joined bytes object.
Returns the labels of the path from the root to the current arc as a decoded unicode string.
Moves the cursor to the path represented by the given key bytes.
Returns True if the current arc leads to a stop state.
Switch to the sibling arc with the given label bytes.
Returns the value at the current arc, if reading an FST.
“A cursor-type object for navigating an FST/word graph, represented by a GraphReader object.
>>> cur = GraphReader(dawgfile).cursor()
>>> for key in cur.follow():
... print(repr(key))
The cursor “rests” on arcs in the FSA/FST graph, rather than nodes.
Represents a directed arc between two nodes in an FSA/FST graph.
The lastarc attribute is True if this is the last outgoing arc from the previous node.
Parameters: |
|
---|
Writes an FSA/FST graph to disk.
Call insert(key) to insert keys into the graph. You must insert keys in sorted order. Call close() to finish the graph and close the file.
>>> gw = GraphWriter(my_file)
>>> gw.insert("alfa")
>>> gw.insert("bravo")
>>> gw.insert("charlie")
>>> gw.close()
The graph writer can write separate graphs for multiple fields. Use start_field(name) and finish_field() to separate fields.
>>> gw = GraphWriter(my_file)
>>> gw.start_field("content")
>>> gw.insert("alfalfa")
>>> gw.insert("apple")
>>> gw.finish_field()
>>> gw.start_field("title")
>>> gw.insert("artichoke")
>>> gw.finish_field()
>>> gw.close()
Parameters: |
|
---|
Finishes the current graph and closes the underlying file.
Finishes the graph for the current field.
Inserts the given key into the graph.
Parameters: |
|
---|
Starts a new graph for the given field.
Takes a string and returns a list of bytestrings, suitable for use as a key or path in an FSA/FST graph.
Yields a series of keys in the given graph within k edit distance of text. If prefix is greater than 0, all keys must match the first prefix characters of text.
Base for classes the describe how to encode and decode FST values.
Adds the given prefix (the result of a call to common()) to the given value.
Returns the “common” part of the two values, for whatever “common” means for this class. For example, a string implementation would return the common shared prefix, for an int implementation it would return the minimum of the two numbers.
If there is no common part, this method should return None.
Returns True if v is a valid object that can be stored by this class.
Reads a value from the given file.
Skips over a value in the given file.
Subtracts the “common” part (the prefix) from the given value.
Returns a str (Python 2.x) or bytes (Python 3) representation of the given value. This is used for calculating node digests, so it should be unique but fast to calculate, and does not have to be parseable.
Writes value v to a file.
Stores integer values in an FST.
Abstract base class for value types that store sequences.
Stores bytes objects (str in Python 2.x) in an FST.
Stores array.array objects in an FST.
Stores lists of positive, increasing integers (that is, lists of integers where each number is >= 0 and each number is greater than or equal to the number that precedes it) in an FST.