Steven D'Aprano
2/7/2008 12:36:00 AM
On Wed, 06 Feb 2008 17:32:53 -0600, Robert Kern wrote:
> Jeff Schwab wrote:
....
>> If the strings happen to be the same length, the Levenshtein distance
>> is equivalent to the Hamming distance.
....
> I'm afraid that it isn't. Using Magnus Lie Hetland's implementation:
....
> In [4]: hamdist('abcdef', 'fabcde')
> Out[4]: 6
>
> In [5]: levenshtein('abcdef', 'fabcde')
> Out[5]: 2
I can confirm Robert's results, using a different implementation of the
Levenshtein edit distance. It isn't true that Levenshtein simplifies to
Hamming when the strings are equal length.
For what it's worth, here's my implementation, which I wrote some years
ago. I doubt it's optimal.
def levenshtein(s1, s2):
"""Return the Levenshtein edit distance between two strings.
s1 and s2 should be two arbitrary length strings. Returns an
integer count of the edit distance between the strings.
"""
m, n = len(s1), len(s2)
if m == 0: return n
elif n == 0: return m
# Create an array with rows 0..m and columns 0..n.
array = [None]*(m+1)
for i in range(m+1):
array[i] = [None]*(n+1)
# Initialise the first row to 0..n.
for i in range(n+1):
array[0][i] = i
# Initialize the first column to 0..m.
for i in range(m+1):
array[i][0] = i
# Measure the differences.
for i in range(1, m+1):
c1 = s1[i-1]
for j in range(1, n+1):
c2 = s2[j-1]
cost = int(c1 != c2)
x = array[i][j-1] + 1 # Cell immediately above plus one.
y = array[i-1][j] + 1 # Cell immediately left plus one.
z = array[i-1][j-1] + cost # Cell diagonally above and
# to the left, plus the cost.
array[i][j] = min(x, y, z)
# When done, the bottom-right cell contains the Levenshtein distance.
return array[m][n]
--
Steven