class Riddle::Client

This class was heavily based on the existing Client API by Dmytro Shteflyuk and Alexy Kovyrin. Their code worked fine, I just wanted something a bit more Ruby-ish (ie. lowercase and underscored method names). I also have used a few helper classes, just to neaten things up.

Feel free to use it wherever. Send bug reports, patches, comments and suggestions to pat at freelancing-gods dot com.

Most properties of the client are accessible through attribute accessors, and where relevant use symboles instead of the long constants common in other clients. Some examples:

client.sort_mode  = :extended
client.sort_by    = "birthday DESC"
client.match_mode = :extended

To add a filter, you will need to create a Filter object:

client.filters << Riddle::Client::Filter.new("birthday",
  Time.at(1975, 1, 1).to_i..Time.at(1985, 1, 1).to_i, false)

Public Class Methods

connection() click to toggle source
# File lib/riddle/client.rb, line 134
def self.connection
  @@connection
end
connection=(value) click to toggle source
# File lib/riddle/client.rb, line 128
def self.connection=(value)
  Riddle.mutex.synchronize do
    @@connection = value
  end
end
new(servers = nil, port = nil, key = nil) click to toggle source

Can instantiate with a specific server and port - otherwise it assumes defaults of localhost and 9312 respectively. All other settings can be accessed and changed via the attribute accessors.

# File lib/riddle/client.rb, line 141
def initialize(servers = nil, port = nil, key = nil)
  Riddle.version_warning

  @servers = Array(servers || "localhost")
  @port   = port || 9312
  @socket = nil
  @key    = key

  reset

  @queue = []
end

Public Instance Methods

add_override(attribute, type, values) click to toggle source
# File lib/riddle/client.rb, line 475
def add_override(attribute, type, values)
  @overrides[attribute] = {:type => type, :values => values}
end
append_query(search, index = '*', comments = '') click to toggle source

Append a query to the queue. This uses the same parameters as the query method.

# File lib/riddle/client.rb, line 219
def append_query(search, index = '*', comments = '')
  @queue << query_message(search, index, comments)
end
close() click to toggle source
# File lib/riddle/client.rb, line 489
def close
  close_socket
end
excerpts(options = {}) click to toggle source

Build excerpts from search terms (the words) and the text of documents. Excerpts are bodies of text that have the words highlighted. They may also be abbreviated to fit within a word limit.

As part of the options hash, you will need to define:

  • :docs

  • :words

  • :index

Optional settings include:

  • :before_match (defaults to <span class=“match”>)

  • :after_match (defaults to </span>)

  • :chunk_separator (defaults to ‘ &#8230; ’ - which is an HTML ellipsis)

  • :limit (defaults to 256)

  • :around (defaults to 5)

  • :exact_phrase (defaults to false)

  • :single_passage (defaults to false)

The defaults differ from the official PHP client, as I’ve opted for semantic HTML markup.

Example:

client.excerpts(:docs => ["Pat Allan, Pat Cash"], :words => 'Pat', :index => 'pats')
#=> ["<span class=\"match\">Pat</span> Allan, <span class=\"match\">Pat</span> Cash"]

lorem_lipsum = "Lorem ipsum dolor..."

client.excerpts(:docs => ["Pat Allan, #{lorem_lipsum} Pat Cash"], :words => 'Pat', :index => 'pats')
#=> ["<span class=\"match\">Pat</span> Allan, Lorem ipsum dolor sit amet, consectetur adipisicing
       elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua &#8230; . Excepteur
       sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est
       laborum. <span class=\"match\">Pat</span> Cash"]

Workflow:

Excerpt creation is completely isolated from searching the index. The nominated index is only used to discover encoding and charset information.

Therefore, the workflow goes:

  1. Do the sphinx query.

  2. Fetch the documents found by sphinx from their repositories.

  3. Pass the documents’ text to excerpts for marking up of matched terms.

# File lib/riddle/client.rb, line 386
def excerpts(options = {})
  options[:index]                ||= '*'
  options[:before_match]         ||= '<span class="match">'
  options[:after_match]          ||= '</span>'
  options[:chunk_separator]      ||= ' &#8230; ' # ellipsis
  options[:limit]                ||= 256
  options[:limit_passages]       ||= 0
  options[:limit_words]          ||= 0
  options[:around]               ||= 5
  options[:exact_phrase]         ||= false
  options[:single_passage]       ||= false
  options[:query_mode]           ||= false
  options[:force_all_words]      ||= false
  options[:start_passage_id]     ||= 1
  options[:load_files]           ||= false
  options[:html_strip_mode]      ||= 'index'
  options[:allow_empty]          ||= false
  options[:passage_boundary]     ||= 'none'
  options[:emit_zones]           ||= false
  options[:load_files_scattered] ||= false

  response = Response.new request(:excerpt, excerpts_message(options))

  options[:docs].collect { response.next }
end
flush_attributes() click to toggle source
# File lib/riddle/client.rb, line 467
def flush_attributes
  response = Response.new request(
    :flushattrs, Message.new
  )

  response.next_int
end
keywords(query, index, return_hits = false) click to toggle source

Generates a keyword list for a given query. Each keyword is represented by a hash, with keys :tokenised and :normalised. If return_hits is set to true it will also report on the number of hits and documents for each keyword (see :hits and :docs keys respectively).

# File lib/riddle/client.rb, line 434
def keywords(query, index, return_hits = false)
  response = Response.new request(
    :keywords,
    keywords_message(query, index, return_hits)
  )

  (0...response.next_int).collect do
    hash = {}
    hash[:tokenised]  = response.next
    hash[:normalised] = response.next

    if return_hits
      hash[:docs] = response.next_int
      hash[:hits] = response.next_int
    end

    hash
  end
end
open() click to toggle source
# File lib/riddle/client.rb, line 479
def open
  open_socket

  return if Versions[:search] < 0x116

  @socket.send [
    Commands[:persist], 0, 4, 1
  ].pack("nnNN"), 0
end
query(search, index = '*', comments = '') click to toggle source

Query the Sphinx daemon - defaulting to all indices, but you can specify a specific one if you wish. The search parameter should be a string following Sphinx's expectations.

The object returned from this method is a hash with the following keys:

  • :matches

  • :fields

  • :attributes

  • :attribute_names

  • :words

  • :total

  • :total_found

  • :time

  • :status

  • :warning (if appropriate)

  • :error (if appropriate)

The key :matches returns an array of hashes - the actual search results. Each hash has the document id (:doc), the result weighting (:weight), and a hash of the attributes for the document (:attributes).

The :fields and :attribute_names keys return list of fields and attributes for the documents. The key :attributes will return a hash of attribute name and type pairs, and :words returns a hash of hashes representing the words from the search, with the number of documents and hits for each, along the lines of:

results[:words]["Pat"] #=> {:docs => 12, :hits => 15}

:total, :total_found and :time return the number of matches available, the total number of matches (which may be greater than the maximum available, depending on the number of matches and your sphinx configuration), and the time in milliseconds that the query took to run.

:status is the error code for the query - and if there was a related warning, it will be under the :warning key. Fatal errors will be described under :error.

# File lib/riddle/client.rb, line 336
def query(search, index = '*', comments = '')
  @queue << query_message(search, index, comments)
  self.run.first
end
reset() click to toggle source

Reset attributes and settings to defaults.

# File lib/riddle/client.rb, line 155
def reset
  # defaults
  @offset         = 0
  @limit          = 20
  @max_matches    = 1000
  @match_mode     = :all
  @sort_mode      = :relevance
  @sort_by        = ''
  @weights        = []
  @id_range       = 0..0
  @filters        = []
  @group_by       = ''
  @group_function = :day
  @group_clause   = '@weight DESC'
  @group_distinct = ''
  @cut_off        = 0
  @retry_count    = 0
  @retry_delay    = 0
  @anchor         = {}
  # string keys are index names, integer values are weightings
  @index_weights  = {}
  @rank_mode      = :proximity_bm25
  @rank_expr      = ''
  @max_query_time = 0
  # string keys are field names, integer values are weightings
  @field_weights  = {}
  @timeout        = 0
  @overrides      = {}
  @select         = "*"
end
run() click to toggle source

Run all the queries currently in the queue. This will return an array of results hashes.

# File lib/riddle/client.rb, line 225
def run
  response = Response.new request(:search, @queue)

  results = @queue.collect do
    result = {
      :matches         => [],
      :fields          => [],
      :attributes      => {},
      :attribute_names => [],
      :words           => {}
    }

    result[:status] = response.next_int
    case result[:status]
    when Statuses[:warning]
      result[:warning] = response.next
    when Statuses[:error]
      result[:error] = response.next
      next result
    end

    result[:fields] = response.next_array

    attributes = response.next_int
    attributes.times do
      attribute_name = response.next
      type           = response.next_int

      result[:attributes][attribute_name] = type
      result[:attribute_names] << attribute_name
    end

    result_attribute_names_and_types = result[:attribute_names].
      inject([]) { |array, attr| array.push([ attr, result[:attributes][attr] ]) }

    matches   = response.next_int
    is_64_bit = response.next_int

    result[:matches] = (0...matches).map do |i|
      doc = is_64_bit > 0 ? response.next_64bit_int : response.next_int
      weight = response.next_int

      current_match_attributes = {}

      result_attribute_names_and_types.each do |attr, type|
        current_match_attributes[attr] = attribute_from_type(type, response)
      end

      {:doc => doc, :weight => weight, :index => i, :attributes => current_match_attributes}
    end

    result[:total] = response.next_int.to_i || 0
    result[:total_found] = response.next_int.to_i || 0
    result[:time] = ('%.3f' % (response.next_int / 1000.0)).to_f || 0.0

    words = response.next_int
    words.times do
      word = response.next
      docs = response.next_int
      hits = response.next_int
      result[:words][word] = {:docs => docs, :hits => hits}
    end

    result
  end

  @queue.clear
  results
end
server() click to toggle source

The searchd server to query. Servers are removed from @server after a Timeout::Error is hit to allow for fail-over.

# File lib/riddle/client.rb, line 188
def server
  @servers.first
end
server=(server) click to toggle source

Backwards compatible writer to the @servers array.

# File lib/riddle/client.rb, line 193
def server=(server)
  @servers = server.to_a
end
set_anchor(lat_attr, lat, long_attr, long) click to toggle source

Set the geo-anchor point - with the names of the attributes that contain the latitude and longitude (in radians), and the reference position. Note that for geocoding to work properly, you must also set #match_mode to :extended. To sort results by distance, you will need to set #sort_by to ‘@geodist asc’, and #sort_mode to extended (as an example). Sphinx expects latitude and longitude to be returned from you SQL source in radians.

Example:

client.set_anchor('lat', -0.6591741, 'long', 2.530770)
# File lib/riddle/client.rb, line 208
def set_anchor(lat_attr, lat, long_attr, long)
  @anchor = {
    :latitude_attribute   => lat_attr,
    :latitude             => lat,
    :longitude_attribute  => long_attr,
    :longitude            => long
  }
end
status() click to toggle source
# File lib/riddle/client.rb, line 454
def status
  response = Response.new request(
    :status, Message.new
  )

  rows, cols = response.next_int, response.next_int

  (0...rows).inject({}) do |hash, row|
    hash[response.next.to_sym] = response.next
    hash
  end
end
update(index, attributes, values_by_doc) click to toggle source

Update attributes - first parameter is the relevant index, second is an array of attributes to be updated, and the third is a hash, where the keys are the document ids, and the values are arrays with the attribute values - in the same order as the second parameter.

Example:

client.update('people', ['birthday'], {1 => [Time.at(1982, 20, 8).to_i]})
# File lib/riddle/client.rb, line 421
def update(index, attributes, values_by_doc)
  response = Response.new request(
    :update,
    update_message(index, attributes, values_by_doc)
  )

  response.next_int
end