Asp Forum - Buiding anamgrams

andrea

12/26/2008 9:31:00 AM

I'm studying ruby (and I really love it even more than python) and I
want to do a little exercise.

I want to create an efficient module to generate every possible
anagrams of a word, of any given size...

I already done this in python, I want to see how can it be elegant in
ruby...
I need a good permutation algorithm (this
http://www.cut-the-knot.org/Curriculum/Combinatorics/JohnsonTro...
works fine) and I'd like to use enumerators in the best and most
efficient way.

I also looked at callcc which is great for backtracking, could be used
in this case??
thanks for any help

A basic structure like that could be fine?
class Anagrams
attr :word
include Enumerable
include Comparable

def <=>(other)
self.word <=> other.word
end

def each(&block)
@word.chars
end

def initialize(word)
@word = word
end

end

7 Answers

Robert Klemme

12/26/2008 10:46:00 AM

On 26.12.2008 10:30, andrea wrote:
> I'm studying ruby (and I really love it even more than python) and I
> want to do a little exercise.

Welcome to the pack! What a nice Christmas occupation. :-)

> I want to create an efficient module to generate every possible
> anagrams of a word, of any given size...

Do you really mean "anagrams" or rather "permutations"? To decide on
anagrams you likely need some kind of dictionary which is significantly
more difficult than just permuting.

> I already done this in python, I want to see how can it be elegant in
> ruby...
> I need a good permutation algorithm (this
> http://www.cut-the-knot.org/Curriculum/Combinatorics/JohnsonTro...
> works fine) and I'd like to use enumerators in the best and most
> efficient way.

From 1.8.7 on you can do this

irb(main):009:0> "abc".scan(/./).permutation {|x| p x.join}
"abc"
"acb"
"bac"
"bca"
"cab"
"cba"
=> ["a", "b", "c"]
irb(main):010:0> "abc".chars.to_a.permutation {|x| p x.join}
"abc"
"acb"
"bac"
"bca"
"cab"
"cba"
=> ["a", "b", "c"]
irb(main):011:0>

Now you only need a filter that properly decides which of those is a
proper word and use that to decide which words to print and which not.

> I also looked at callcc which is great for backtracking, could be used
> in this case??

I would try to avoid it as it makes code very hard to read - at least
for me. :-)

> thanks for any help
>
> A basic structure like that could be fine?
> class Anagrams
> attr :word
> include Enumerable
> include Comparable
>
> def <=>(other)
> self.word <=> other.word
> end
>
> def each(&block)
> @word.chars
> end
>
> def initialize(word)
> @word = word
> end
>
> end

What exactly should be the responsibility of this class? Especially,
why do you call it "anagrams" but store only one word in it?

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end

Tiago Nogueira

12/26/2008 11:22:00 AM

That's it. In ruby 1.8.6 you cant use this form , only for 1.8.7 and prior
...
I tested here and the code works perfectly.
About your class:
class Anagrams
>> attr :word
>> include Enumerable
>> include Comparable
>>
>> def <=>(other)
>> self.word <=> other.word
>> end
>>
>> def each(&block)
>> @word.chars
>> end
>>
>> def initialize(word)
>> @word = word
>> end
>>
>> end
i really dont understant what it means. Explain better.
King regards
tiago

On Fri, 26 Dec 2008 19:50:00 +0900, Robert Klemme
<shortcutter@googlemail.com> wrote:
> On 26.12.2008 10:30, andrea wrote:
>> I'm studying ruby (and I really love it even more than python) and I
>> want to do a little exercise.
>
> Welcome to the pack! What a nice Christmas occupation. :-)
>
>> I want to create an efficient module to generate every possible
>> anagrams of a word, of any given size...
>
> Do you really mean "anagrams" or rather "permutations"? To decide on
> anagrams you likely need some kind of dictionary which is significantly
> more difficult than just permuting.
>
>> I already done this in python, I want to see how can it be elegant in
>> ruby...
>> I need a good permutation algorithm (this
>>
> http://www.cut-the-knot.org/Curriculum/Combinatorics/JohnsonTro...
>> works fine) and I'd like to use enumerators in the best and most
>> efficient way.
>
> From 1.8.7 on you can do this
>
> irb(main):009:0> "abc".scan(/./).permutation {|x| p x.join}
> "abc"
> "acb"
> "bac"
> "bca"
> "cab"
> "cba"
> => ["a", "b", "c"]
> irb(main):010:0> "abc".chars.to_a.permutation {|x| p x.join}
> "abc"
> "acb"
> "bac"
> "bca"
> "cab"
> "cba"
> => ["a", "b", "c"]
> irb(main):011:0>
>
> Now you only need a filter that properly decides which of those is a
> proper word and use that to decide which words to print and which not.
>
>> I also looked at callcc which is great for backtracking, could be used
>> in this case??
>
> I would try to avoid it as it makes code very hard to read - at least
> for me. :-)
>
>> thanks for any help
>>
>> A basic structure like that could be fine?
>> class Anagrams
>> attr :word
>> include Enumerable
>> include Comparable
>>
>> def <=>(other)
>> self.word <=> other.word
>> end
>>
>> def each(&block)
>> @word.chars
>> end
>>
>> def initialize(word)
>> @word = word
>> end
>>
>> end
>
> What exactly should be the responsibility of this class? Especially,
> why do you call it "anagrams" but store only one word in it?
>
> Cheers
>
> robert
>
> --
> remember.guy do |as, often| as.you_can - without end

Axel Etzold

12/27/2008 12:27:00 AM

Dear Andrea,

> I want to create an efficient module to generate every possible
> anagrams of a word, of any given size...
> I already done this in python, I want to see how can it be elegant in
> ruby...
> I need a good permutation algorithm (this
> http://www.cut-the-knot.org/Curriculum/Combinatorics/JohnsonTro...
> works fine) and I'd like to use enumerators in the best and most
> efficient way.
>
> I also looked at callcc which is great for backtracking, could be used
> in this case??

Permutations become quite unwieldy very quickly ... maybe you'd have a look
at Hidden Markov Models instead. There are several algorithms associated
to these models. I think what you'd want to do is to apply the "forward algorithm"
to estimate how probable a given word is, given the probability of letter x2 succeeding
letter x1 in the language you are trying to form anagrams in ( ... download some novels
from project Gutenberg, split them into pairs of letters and count.....) .

For the forward algorithm, look at the concrete example here:

http://en.wikipedia.org/wiki/Viterbi...

There once was a Ruby quiz about Markov chains to help you with Ruby:

http://www.rubyquiz.com/q...

To test whether a given word exists in language X, you could use raspell in conjunction with
aspell:

http://blog.evanweaver.com/files/doc/fauna/raspell/files/R...

Best regards,

Axel
--
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/mult...

andrea

12/27/2008 8:58:00 AM

Thanks a lot everyone, now I already posted but don't know why it
doesn't appear my message.
Anyway you're right Axel, this is the first "solution" to the problem
but it's really too slow for any reasonable large size of the string.

This is the code, any hint about style or whatever is welcome:
http://pastie.textmate.org/private/rp4ivw36vefu...

I'll have a look to the Markov Chain (which I'm studying right now at
uni).

Dave Thomas

12/27/2008 6:46:00 PM

On Dec 27, 2008, at 2:59 AM, andrea wrote:

> Thanks a lot everyone, now I already posted but don't know why it
> doesn't appear my message.
> Anyway you're right Axel, this is the first "solution" to the problem
> but it's really too slow for any reasonable large size of the string.
>
> This is the code, any hint about style or whatever is welcome:
> http://pastie.textmate.org/private/rp4ivw36vefu...

I think a faster approach, particularly to find multiple anagrams, is
to create a signature or hash with the property that each word in a
set of anagrams will have the same hash code. Finding anagrams then
becomes a problem of building lists of words with the hash hash code.

A convenient hash is simply to take all the letters in each work and
sort them, so that "dog" becomes "dgo" and "god" also becomes "dgo".
Because they have the same hash, they are anagrams.

Here's the code from the PickAxe[1] that uses this to find anagrams:

http://pastie.textmate.org/private/yttznlgzcf9...

and here are some simple tests:

http://pastie.textmate.org/private/stv3sxaecc6z...

In fact, I use this as an example of Gem packaging, so you can
actually do

gem install anagram

and get the whole thing.

Cheers

Dave

[1] http://pragprog.com/ti...

Justin Collins

12/28/2008 1:14:00 AM

andrea wrote:
> Thanks a lot everyone, now I already posted but don't know why it
> doesn't appear my message.
> Anyway you're right Axel, this is the first "solution" to the problem
> but it's really too slow for any reasonable large size of the string.
>
> This is the code, any hint about style or whatever is welcome:
> http://pastie.textmate.org/private/rp4ivw36vefu...
>
>
> I'll have a look to the Markov Chain (which I'm studying right now at
> uni).
>

Just for fun/reference, this is what I came up with after reading your
post earlier. Note that it doesn't do the "of any length" requirement,
though.

require 'rubygems'
require 'raspell'

def anagrams word
checker = Aspell.new
word.split(//).permutation.select { |w| checker.check w.join
}.map { |w| w.join }
end

puts anagrams "words"

-Justin

andrea

12/31/2008 1:53:00 PM

Thanks to everyone for your very nice answers, now I wrote another
version of the program.

http://pastie.textmate.org/private/ryxlzlqle8tx...

Which implments all 3 methods discussed here, the nicest of course is
the third.

It creates signature only if really needed, otherwise it just loads
the dict with Marshall. (other ways?)
Is correct the use of md5?? I think it is as it gives the same result
of the openssl md5, but it looks a little bit too fast for such a big
file.
dict_md5 = Digest::MD5.hexdigest(open(@dict).read)
conf_md5 = open($CONF).read
# then they must be equal and I have the correct
# signatures for my dictionary
if dict_md5 == conf_md5
puts "we already have the right signatures" if $DEBUG
# loading the precomputed signature file
@signatures = Marshal.load(open($SIGNATURE))
else

It's not bad at all, the only problem is that with a huge dictionary
it takes a long time to load it (around 38seconds for a almost 1-
million word dictionary) and takes a big amount of ram.

Could I maybe load it incrementally? Maybe divide the signs by size
and load just what I need to check?
Is it possible using only one file?

Thanks a lot

comp.lang.ruby

Buiding anamgrams

andrea

Robert Klemme

Tiago Nogueira

Axel Etzold

andrea

Dave Thomas

Justin Collins

andrea

x Login to ForumsZone