John
5/16/2008 1:51:00 AM
Sup, fools?
This is the Levenshtein function I'm gankin' for my file comparison
project (see "40 million comparison..." thread):
# Levenshtein calculator
# Author: Paul Battley (pbattley@gmail.com)
# Modified slightly by John Perkins:
# -- removed $KCODE call
def distance(str1, str2)
unpack_rule = 'C*'
s = str1.unpack(unpack_rule)
t = str2.unpack(unpack_rule)
n = s.length
m = t.length
return m if (0 == n) # stop the madness if either string is empty
return n if (0 == m)
d = (0..m).to_a
x = nil
(0...n).each do |i|
e = i + 1
(0...m).each do |j|
cost = (s[i] == t[j]) ? 0 : 1
x = [
d[j + 1] + 1, # insertion
e + 1, # deletion
d[j] + cost # substitution
].min
d[j] = e
e = x
end
d[m] = x
end
return x
end
When I ran this with test data in ruby 1.8 the output was 969, but
when I ran it on a 1.9 install the output was 1011. I'm aware that
some of the rules have changed, especially with arrays. Does anyone
see where the discrepancy lies, because I sure as heck don't. The
files didn't change so the distance shouldn't either. Thanks for all
your help in advance.