Asp Forum - Re: [QUIZ] Shirt Reader (#140

James Koppel

9/27/2007 3:14:00 AM

My solution's algorithm is fairly simple: Take the words, take their pronunciations in IPA or a suitable equivalent, and then find the word with the pronunciation that requires the least amount of changes to turn it into the concatenated pronunciations of the terms (I am indebted to Steve D for the knowledge that this is called the "Levenshtein Distance").>ruby Rebus..rb (B+YOU+TEA+FULL)BEAUTIFULI had plans for performing this on every combination of the synonyms of the given terms, and acquired both a thesaurus and pronunciation dictionary from http://www.dcs.shef.ac.uk/research/i..., but, unfortunately, they both had quite a few more entries than they should have. Apparently, "skim milk" is a synonym for "cheese." Additionally, according to the Moby Dictionary, "contretemps" is pronounced the way it's spelled, and "ember" is not a word, but "embers" is. To use the thesaurus would be to make a program that already takes a minute or so to run take a day instead. For those reasons, as well as the fact that the dictionary contains a lot that it shouldn't have (what's a "Fayme?"), and due to some unintuitive pronunciation differences, my program doesn't use a thesaurus, and gets a lot of things wrong.>ruby Rebus.rb (E+SCENT+SHELLS)CENTRAL'S(Yes, the unweighted Levensgtein distance really is lower than to "ESSENTIALS" when using Moby Pronunciation Dictionary.)>ruby Rebus.rb (FAN+TASK+TICK)FANTASKTIKNote on my program's input: only expressions within parentheses are evaluated (allowing for multiple evaluation and leaving known terms unchanged in a multi-rebus phrase that must be evaluated), everything must be cnverted to all caps, and operators must be shown. The latter is because I've implemented a crude interpreation of subtraction in rebuses.>ruby Rebus.rb (LOVE-L)OFAs can be seen, my program's output is regretfully only mediocre. Changes that would greatly improve output include some mechanism to differentiate between using a letter's name and phoneme (i.e.: "aitch" versus the h-sound) (at present the dictionary just uses the name), using a more restrictive dictionary and thesaurus, and modifying the levenshtein_distance method to make changes between similar vowels and consonants have a lower weight than changes between dissimilar vowels and consonants.But alas, the period for this quiz is ending.Anyway, here's my code:#$thesaurus = File.open("mobythes.aur") do |f|# h = Hash.new# f.readlines.each do |line|# words=line.chomp.split(',').map{|s|s.upcase}# h[words[0]] = words[1..-1]# end# h#end#This only works if the beginning comments are manually removed$pronunciations = File.open("cmudict0.3") do |f| NUMBER_OF_SYMBOLS = 39 h = Hash.new lines = f.readlines lines[0...NUMBER_OF_SYMBOLS].each do |line| h[lines[0...(line =~ /[A-Z]/)]] = line.split[1..-1] end lines[NUMBER_OF_SYMBOLS..-1].each do |line| words = line.split(/\s+/) h[words.first] = words[1..-1] end hend$words = $pronunciations.keys#See http://en.wikipedia.org/wiki/Levenshtein_distance#The_al... levenshtein_distance(a, b) prev_row = (0..b..length).to_a cur_row = [0] * b.length 1.upto(a.length) do |i| cur_row[0] = i 1.upto(b.length) do |j| cost = (a[i-1] == b[j-1]) ? 0 : 1 cur_row[j] = [prev_row[j]+1, cur_row[j-1]+1, prev_row[j-1]+cost].min end prev_row = cur_row cur_row = cur_row.dup end prev_row.lastend#def all_synonym_combinations(word_arr)# if word_arr.size == 1# ($thesaurus[word_arr.first] || [word_arr.first]).map{|w|[w]}# else# next_combs = all_synonym_combinations(word_arr[1..-1])# next_combs_length = next_combs.flatten.length# synonyms = ($thesaurus[word_arr[0]] || []) << word_arr[0]# synonyms = synonyms.inject([]){|a,w| a+=[[w]]*next_combs_length}# synonyms.flatten.zip(next_combs*synonyms.length# ).map{|a|a.flatten}# end#endrebus = ARGV[0].chompexpressions = rebus.scan(/(?:$).+?(?:$)/)expressions.each do |expression| ops = expression.scan(/[+-]/) terms = expression.gsub(/[]/,"").scan(/[^+-]+/) #all_synonym_combinations(terms).each do |terms| pronunciation = $pronunciations[terms[0]] terms[1..-1].each_with_index do |term,idx| if ops[idx] == "+" pronunciation += $pronunciations[term] else #Set difference is a very crude interpretation of subtraction in rebuses #(especially considering, in cmudict, a letter's pronunciation is that # of its name, not its main phoneme) pronunciation -= $pronunciations[term] end end reps = $words.sort_by{|word| levenshtein_distance($pronunciations[word], pronunciation)} rep = terms.index(reps[0]) ? reps[1] : reps[0] rebus[expression] = rep #endendputs rebus.gsub("+"," ") ____________________________________________________________________________________Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Searchhttp://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts...

comp.lang.ruby

Re: [QUIZ] Shirt Reader (#140

James Koppel

x Login to ForumsZone