[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

is this a good way to find anagrams?

travis laduke

12/8/2005 5:52:00 AM

it seems to me, with computers these days, this should finish
instantly, not take like 20 seconds.
also, please help me make my ruby more ruby-like. i'm new to ruby,
not that i know any other language.


##these are here from when i was first testing
secret_word = "spine"
rack = "spine"

secret_word = secret_word.split(//)
rack = rack.split(//)
test_rack = rack.to_s

##are there enough of the right letters in the rack to spell the word
from the dictionary?
##i think i'm going to use funny spanish names for my methods instead
of descriptive names.
def rodolfo(secret_word, rack)
secret_word.each do |x|
rack = rack.to_s
if rack.include? x
rack = rack.sub(x, '')
##p rack, x
else
##puts x, ' rack don\'t work'
break
end
end
end


puts "reading dictionary"
dict = IO.readlines('/usr/share/dict/words')

while rack
puts "enter rack"
rack = gets.chomp.split(//)

dict.each do |secret_word|

if rodolfo(secret_word.chomp.split(//), rack)
puts secret_word.to_s
end
end
end


9 Answers

mrkode@gmail.com

12/8/2005 7:32:00 AM

0

umm... what's the dictionary for?

also, I think there's an easier way to do this. with a bit of spacing and
stuff, I wrote up a 10 line solution. ;)
I also did the splitting string into array thing. Then there's two instance
methods for Array, named "sort" and "==" respectively.

An anagram is when two words contain the exact same characters, right? so if
the two splitted strings are the same when sorted, they should be anagrams.

On 12/8/05, travis laduke <wrong@socal.rr.com> wrote:
>
> it seems to me, with computers these days, this should finish
> instantly, not take like 20 seconds.
> also, please help me make my ruby more ruby-like. i'm new to ruby,
> not that i know any other language.
>
>
> ##these are here from when i was first testing
> secret_word = "spine"
> rack = "spine"
>
> secret_word = secret_word.split(//)
> rack = rack.split(//)
> test_rack = rack.to_s
>
> ##are there enough of the right letters in the rack to spell the word
> from the dictionary?
> ##i think i'm going to use funny spanish names for my methods instead
> of descriptive names.
> def rodolfo(secret_word, rack)
> secret_word.each do |x|
> rack = rack.to_s
> if rack.include? x
> rack = rack.sub(x, '')
> ##p rack, x
> else
> ##puts x, ' rack don\'t work'
> break
> end
> end
> end
>
>
> puts "reading dictionary"
> dict = IO.readlines('/usr/share/dict/words')
>
> while rack
> puts "enter rack"
> rack = gets.chomp.split(//)
>
> dict.each do |secret_word|
>
> if rodolfo(secret_word.chomp.split(//), rack)
> puts secret_word.to_s
> end
> end
> end
>
>

Bill Atkins

12/8/2005 8:02:00 AM

0

One thing you could try is inverting your algorithm. My dict/words
file has 234,000+ entries in it. Since you're looping through the dict
file and then calling rodolfo on each iteration, you have to pay the
cost of calling rodolfo 234,000 times, even though in general only a
couple of entries in the words file are relevant.

So what you could do is first find all the possible permuations of the
letters in your given word. Then loop through the word file once. If
a word matches one of the permutations in your list, remove it from the
list of words to be checked and push it onto a separate list. This
should be significantly faster, since you calculate the combinations up
front, and then do one scan through the list.

You could also call out to a program like aspell or ispell to check
your words for you. Those programs will do something at least as fast
as a binary search, which will take much, much less time than scanning
through the whole thing - even in the worst case, a 234,000 entry file
will be checked in 18 string comparisons, instead of 234,000.

hth,
Bill

travis laduke wrote:
> it seems to me, with computers these days, this should finish
> instantly, not take like 20 seconds.
> also, please help me make my ruby more ruby-like. i'm new to ruby,
> not that i know any other language.
>
>
> ##these are here from when i was first testing
> secret_word = "spine"
> rack = "spine"
>
> secret_word = secret_word.split(//)
> rack = rack.split(//)
> test_rack = rack.to_s
>
> ##are there enough of the right letters in the rack to spell the word
> from the dictionary?
> ##i think i'm going to use funny spanish names for my methods instead
> of descriptive names.
> def rodolfo(secret_word, rack)
> secret_word.each do |x|
> rack = rack.to_s
> if rack.include? x
> rack = rack.sub(x, '')
> ##p rack, x
> else
> ##puts x, ' rack don\'t work'
> break
> end
> end
> end
>
>
> puts "reading dictionary"
> dict = IO.readlines('/usr/share/dict/words')
>
> while rack
> puts "enter rack"
> rack = gets.chomp.split(//)
>
> dict.each do |secret_word|
>
> if rodolfo(secret_word.chomp.split(//), rack)
> puts secret_word.to_s
> end
> end
> end

Michael Fellinger

12/8/2005 8:11:00 AM

0

Hey travis,

you gave me a good time figuring this one out... after some thought i came up
with something that takes roughly 1 second to search 9,6k words for a
anagram.
of course this will go up when you add more words (a task you still can think
about)
all in all, 5 LoC... but i'm sure there are even better ways...

word = "easy".split(//).sort
anagrams = IO.readlines('/usr/share/dict/british-english').partition {|l|
l.strip!
(l.size == word.size && l.split(//).sort == word)
}[0]
anagrams.each{|x| puts x}

##### gives you:
manveru@lambda:~/ruby$ time ruby anagramma.rb
ayes
easy
yeas

real 0m1.097s
user 0m0.936s
sys 0m0.105s


Am Donnerstag, 8. Dezember 2005 06:52 schrieb travis laduke:
> it seems to me, with computers these days, this should finish
> instantly, not take like 20 seconds.
> also, please help me make my ruby more ruby-like. i'm new to ruby,
> not that i know any other language.
>
>
> ##these are here from when i was first testing
> secret_word = "spine"
> rack = "spine"
>
> secret_word = secret_word.split(//)
> rack = rack.split(//)
> test_rack = rack.to_s
>
> ##are there enough of the right letters in the rack to spell the word
> from the dictionary?
> ##i think i'm going to use funny spanish names for my methods instead
> of descriptive names.
> def rodolfo(secret_word, rack)
> secret_word.each do |x|
> rack = rack.to_s
> if rack.include? x
> rack = rack.sub(x, '')
> ##p rack, x
> else
> ##puts x, ' rack don\'t work'
> break
> end
> end
> end
>
>
> puts "reading dictionary"
> dict = IO.readlines('/usr/share/dict/words')
>
> while rack
> puts "enter rack"
> rack = gets.chomp.split(//)
>
> dict.each do |secret_word|
>
> if rodolfo(secret_word.chomp.split(//), rack)
> puts secret_word.to_s
> end
> end
> end

Bill Atkins

12/8/2005 9:02:00 AM

0

I went ahead and implemented my own suggestions, and just to give you
an idea of how fast you can get this, my version can find all valid
anagrams of a five-letter word ("truck") given a 274,000 word
dictionary in 0.231 seconds....real-time.

:D

batkins57@gmail.com wrote:
> One thing you could try is inverting your algorithm. My dict/words
> file has 234,000+ entries in it. Since you're looping through the dict
> file and then calling rodolfo on each iteration, you have to pay the
> cost of calling rodolfo 234,000 times, even though in general only a
> couple of entries in the words file are relevant.
>
> So what you could do is first find all the possible permuations of the
> letters in your given word. Then loop through the word file once. If
> a word matches one of the permutations in your list, remove it from the
> list of words to be checked and push it onto a separate list. This
> should be significantly faster, since you calculate the combinations up
> front, and then do one scan through the list.
>
> You could also call out to a program like aspell or ispell to check
> your words for you. Those programs will do something at least as fast
> as a binary search, which will take much, much less time than scanning
> through the whole thing - even in the worst case, a 234,000 entry file
> will be checked in 18 string comparisons, instead of 234,000.
>
> hth,
> Bill
>
> travis laduke wrote:
> > it seems to me, with computers these days, this should finish
> > instantly, not take like 20 seconds.
> > also, please help me make my ruby more ruby-like. i'm new to ruby,
> > not that i know any other language.
> >
> >
> > ##these are here from when i was first testing
> > secret_word = "spine"
> > rack = "spine"
> >
> > secret_word = secret_word.split(//)
> > rack = rack.split(//)
> > test_rack = rack.to_s
> >
> > ##are there enough of the right letters in the rack to spell the word
> > from the dictionary?
> > ##i think i'm going to use funny spanish names for my methods instead
> > of descriptive names.
> > def rodolfo(secret_word, rack)
> > secret_word.each do |x|
> > rack = rack.to_s
> > if rack.include? x
> > rack = rack.sub(x, '')
> > ##p rack, x
> > else
> > ##puts x, ' rack don\'t work'
> > break
> > end
> > end
> > end
> >
> >
> > puts "reading dictionary"
> > dict = IO.readlines('/usr/share/dict/words')
> >
> > while rack
> > puts "enter rack"
> > rack = gets.chomp.split(//)
> >
> > dict.each do |secret_word|
> >
> > if rodolfo(secret_word.chomp.split(//), rack)
> > puts secret_word.to_s
> > end
> > end
> > end

Martin DeMello

12/8/2005 9:38:00 AM

0

batkins57@gmail.com wrote:
>
> So what you could do is first find all the possible permuations of the
> letters in your given word. Then loop through the word file once. If
> a word matches one of the permutations in your list, remove it from the
> list of words to be checked and push it onto a separate list. This
> should be significantly faster, since you calculate the combinations up
> front, and then do one scan through the list.

Even faster - sort the list of permutations, then do an incremental
search against the dictionary: binary search to find the first word in the
list, then repeat, setting the lower bound of your search to the
position immediately after each match when you do the next one.
O(m!*log n) rather than O(m log m * n), which for small m and large n
can lead to a speedup. It might be even faster to search for the middle
word in your permutation list, then do that recursively with the left
and right sublists, passing in the upper and lower bounds as arguments.
All this assumes an in-memory dictionary with O(1) position indexing, of
course.

martin

Christian Neukirchen

12/9/2005 10:10:00 AM

0

Michael Fellinger <m.fellinger@gmail.com> writes:

> Hey travis,
>
> you gave me a good time figuring this one out... after some thought i came up
> with something that takes roughly 1 second to search 9,6k words for a
> anagram.
> of course this will go up when you add more words (a task you still can think
> about)
> all in all, 5 LoC... but i'm sure there are even better ways...
>
> word = "easy".split(//).sort
> anagrams = IO.readlines('/usr/share/dict/british-english').partition {|l|
> l.strip!
> (l.size == word.size && l.split(//).sort == word)
> }[0]
> anagrams.each{|x| puts x}

Better use #find_all, no?

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneuk...


Brian Schröder

12/9/2005 12:05:00 PM

0

On 08/12/05, travis laduke <wrong@socal.rr.com> wrote:
> it seems to me, with computers these days, this should finish
> instantly, not take like 20 seconds.
> also, please help me make my ruby more ruby-like. i'm new to ruby,
> not that i know any other language.
>
> [snip]

if you want to find lots of anagrams against the same database it
would make sense to create an index. Make a hash that is indexed by
the sorted letters and let it point to a list of words that yield this
letter. After you have done this once, you can lookup anagrams in O(1)

cheers,

Brian

--
http://ruby.brian-sch...

Stringed instrument chords: http://chordlist.brian-sch...


Brian Schröder

12/9/2005 12:16:00 PM

0

On 09/12/05, Brian Schröder <ruby.brian@gmail.com> wrote:
> On 08/12/05, travis laduke <wrong@socal.rr.com> wrote:
> > it seems to me, with computers these days, this should finish
> > instantly, not take like 20 seconds.
> > also, please help me make my ruby more ruby-like. i'm new to ruby,
> > not that i know any other language.
> >
> > [snip]
>
> if you want to find lots of anagrams against the same database it
> would make sense to create an index. Make a hash that is indexed by
> the sorted letters and let it point to a list of words that yield this
> letter. After you have done this once, you can lookup anagrams in O(1)
>
> cheers,
>
> Brian
>

Here comes a short implementation:

hash =
File.read('/usr/share/dict/words').
downcase.
split(/\n/).
inject(Hash.new { | h, k | h[k] = [] }) { | h, w |
h[w.split(//).sort] << w; h }


ARGV.each do | word |
puts "Anagrams for #{word}: #{hash[word.split(//).sort].join(' ')}"
end

>>>>>
Anagrams for ruby: ruby bury ruby
Anagrams for duck: duck
Anagrams for type: type
Anagrams for truck: truck
Anagrams for brian: brain brian rabin brain
Anagrams for sort: rots sort tors
Anagrams for index: index nixed

Here are the timings:

The calculation of the index took 3.34749s

Lookup of 7 words took 0.000195s

And lookup of the words scales linearly, so for certain applications
(building an anagram-server e.g.) this may be the best possibility.

Cheers,

Brian

--
http://ruby.brian-sch...

Stringed instrument chords: http://chordlist.brian-sch...


Daniel Berger

12/10/2005 4:49:00 PM

0

batkins57@gmail.com wrote:
> I went ahead and implemented my own suggestions, and just to give you
> an idea of how fast you can get this, my version can find all valid
> anagrams of a five-letter word ("truck") given a 274,000 word
> dictionary in 0.231 seconds....real-time.

I think there's an algorithm for finding anagrams using libgmp, for
which there are Ruby bindings (on the RAA). I'll bet that's fast. :)

Dan