Eugene Kalenkovich
9/25/2007 2:53:00 AM
Here is my solution. Obviously it is not as "well educated" as Steve's one
because it does not use pronunciation dictionary, but it behaves pretty well
with what it has :)
I've tried different hashing strategies, one of them to use soundex, but
that one had one less match, though it was able to understand 're late shun
ship' - metaphone believes that relationship is pronounced differently (and
I cannot blame it :)
Soundex has a big disadvantage of requiring correct first letter, that's why
I think the metaphone-based solution is closer to what we want.
BTW, I was surprised that text gem includes only symmetrical Levenshtein
distance, asymmetrical one (with different costs of operations) could
probably give better results
require 'rubygems'
require 'text'
include Text::Metaphone
include Text::Levenshtein
expectations = {
%w[e scent shells] => 'essentials',
%w[q all if i] => 'qualify',
%w[fan task tick] => 'fantastic',
%w[b you tea full] => 'beautiful',
%w[fun duh mint all] => 'fundamental',
%w[s cape] => 'escape',
%w[pan z] => 'pansy',
%w[n gauge] => 'engage',
%w[cap tin] => 'captain',
%w[g rate full] => 'grateful',
%w[re late shun ship] => 'relationship',
%w[con grad yeul 8] => 'congratulate',
%w[2 burr q low sis] => 'tuberculosis',
%w[my crows cope] => 'microscope',
%w[add minus ray shun] => 'administration',
%w[accent you ate it] => 'accentuated',
%w[add van sing] => 'advancing',
%w[car knee for us] => 'carnivorous',
%w[soup or seed] => 'supercede',
%w[poor 2 bell o] => 'portobello',
%w[d pen dance] => 'dependence',
%w[s o tear rick] => 'esoteric',
%w[4 2 it us] => 'fortuitous',
%w[4 2 n 8] => 'fortunate',
%w[4 in R] => 'foreigner',
%w[naan disk clothes your] => 'nondisclosure',
%w[Granmda Atika Lee] => 'grammatically',
%w[a brie vie a shun] => 'abbreviation',
%w[pheemeeneeneetee] => 'femininity',
%w[me c c p] => 'mississippi',
%w[art fork] => 'aardvark',
%w[liberty giblet] => 'flibbertigibbet',
%w[zoo key knee] => 'zucchini',
%w[you'll tight] => 'yuletide',
%w[Luke I like] => 'lookalike',
%w[mah deux mah zeal] => 'mademoiselle',
%w[may gel omen yak] => 'megalomaniac',
%w[half tell mall eau gist] => 'ophthalmologist',
%w[whore tea cull your wrist] => 'horticulturist',
%w[pant oh my m] => 'pantomime',
%w[tear a ball] => 'terrible',
%w[a bowl i shun] => 'abolition',
%w[pre chair] => 'preacher',
%w[10 s] => 'tennis',
%w[e z] => 'easy',
%w[1 door full] => 'wonderful',
%w[a door] => 'adore',
%w[hole e] => 'holy',
%w[grand your] => 'grandeur',
%w[4 2 5] => 'fortify',
%w[age, it ate her] => 'agitator',
%w[tear it or eel] => 'territorial',
%w[s 1] => 'swan'
}
subs={'1'=>'won','2'=>'to','3'=>'tre','4'=>'for','5'=>'five','6'=>'six','7'=>'seven','8'=>'ate','9'=>'nine','10'=>'ten',
'h'=>'eich','j'=>'jey','k'=>'key','q'=>'que','r'=>'ar'}
subsy={}
%w[b c d g p t v z].each {|l| subsy[l]=l+'y'}
%w[b c d g p t v z].each {|l| subs[l]=l+'e'}
%w[f l m n s x].each{|l| subs[l]='e'+l}
def metadist(str1,str2)
2*distance(metaphone(str1),metaphone(str2))+distance(str1,str2)
end
hash=Hash.new{|h,k|h[k]=[]}
File.open("/usr/share/dict/words") {|f| f.readlines}.each do |w|
word=w.downcase.delete("^a-z")
m1,m2=double_metaphone(word)
hash[m1]<<word
hash[m2]<<word if m2
end
#make sure that expectations are in the word list
expectations.values.each { |word|
m1,m2=double_metaphone(word)
hash[m1]<<word
hash[m2]<<word if m2
}
inputs=[]
if (ARGV.empty?)
inputs=expectations.keys
else
inputs << ARGV
end
inputs.each { |rebus|
y_ed=rebus[0..-2]<<(subsy[rebus[-1]] || rebus[-1])
word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,'')
m1,m2=double_metaphone(word)
results=hash[m1]
results+=hash[m2] if m2
res=results.uniq.sort{|a,b|
(metadist(word,a) <=> metadist(word,b)).nonzero? || a.length<=>b.length
}[0] || 'no clue'
print "'#{rebus.join(' ')}' => #{res}"
expected=expectations[rebus]
print ", expected: #{expected}" if expected
puts
}