Class/Module Index [+]

Quicksearch

Levenshtein

Constants

VERSION

Public Class Methods

distance(a1, a2, threshold=nil, options={}) click to toggle source

Returns the Levenshtein distance between two sequences.

The two sequences can be two strings, two arrays, or two other objects responding to :each. All sequences are by generic (fast) C code.

All objects in the sequences should respond to :hash and :eql?.

# File lib/levenshtein.rb, line 39
def self.distance(a1, a2, threshold=nil, options={})
  a1, a2      = a1.scan(/./), a2.scan(/./) if String === a1 and String === a2
  a1, a2      = Util.pool(a1, a2)

  # Handle some basic circumstances.

  return 0            if a1 == a2
  return a2.size      if a1.empty?
  return a1.size      if a2.empty?

  if threshold
    return nil        if (a1.size-a2.size) >= threshold
    return nil        if (a2.size-a1.size) >= threshold
    return nil        if (a1-a2).size >= threshold
    return nil        if (a2-a1).size >= threshold
  end

  # Remove the common prefix and the common postfix.

  l1  = a1.size
  l2  = a2.size

  offset                      = 0
  no_more_optimizations       = true
 
  while offset < l1 and offset < l2 and a1[offset].equal?(a2[offset])
    offset += 1

    no_more_optimizations     = false
  end
 
  while offset < l1 and offset < l2 and a1[l1-1].equal?(a2[l2-1])
    l1 -= 1
    l2 -= 1

    no_more_optimizations     = false
  end
 
  if no_more_optimizations
    distance_fast_or_slow(a1, a2, threshold, options)
  else
    l1 -= offset
    l2 -= offset

    a1        = a1[offset, l1]
    a2        = a2[offset, l2]

    distance(a1, a2, threshold, options)
  end
end
normalized_distance(a1, a2, threshold=nil, options={}) click to toggle source

Returns the Levenshtein distance as a number between 0.0 and 1.0. It's basically the Levenshtein distance divided by the size of the longest sequence.

# File lib/levenshtein.rb, line 9
def self.normalized_distance(a1, a2, threshold=nil, options={})
  size        = [a1.size, a2.size].max

  if a1.size == 0 and a2.size == 0
    0.0
  elsif a1.size == 0
    a2.size.to_f/size
  elsif a2.size == 0
    a1.size.to_f/size
  else
    if threshold
      if d = self.distance(a1, a2, (threshold*size).to_i+1)
        d.to_f/size
      else
        nil
      end
    else
      self.distance(a1, a2).to_f/size
    end
  end
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.