Asp Forum - Review of code please

Mark Woodward

1/27/2007 12:39:00 PM

Hi all,

I'm hoping for some critique of the code below.
It works but is slow and probably not 'the ruby way'

---------------------------------------------------------
# Anagrams
#
# Purpose: find all anagrams in /usr/share/dict/words
# for 4 letter words (length 5 because of \n)

# path to the word file
filename = "/usr/share/dict/words"

# check file exists and is readable
if File.file?(filename) && File.readable?(filename) then
dict_words = []
all_anagrams = {}

open(filename).each do |dict_word|
if dict_word.length == 5 then
dict_words << dict_word.chomp.strip
end
end

wrds = dict_words

dict_words.each do |wrd|
current_wrd = wrd.split(//).sort.join

anagrams = []

wrds.each do |w|
if wrd != w then
if not current_wrd == w.split(//).sort.join then
next
end
anagrams.push(w)
end
end

if not anagrams.empty? then
all_anagrams[wrd] = anagrams
end
end

p all_anagrams

end
---------------------------------------------------------

thanks,

--
Mark

7 Answers

Mark Woodward

1/27/2007 1:22:00 PM

Well!!,

On Sat, 27 Jan 2007 23:38:30 +1100, Mark Woodward wrote:

> Hi all,
>
> I'm hoping for some critique of the code below.
> It works but is slow and probably not 'the ruby way'
>
> ---------------------------------------------------------
> # Anagrams
> #
> # Purpose: find all anagrams in /usr/share/dict/words
> # for 4 letter words (length 5 because of \n)
>
> # path to the word file
> filename = "/usr/share/dict/words"
>
> # check file exists and is readable
> if File.file?(filename) && File.readable?(filename) then
> dict_words = []
> all_anagrams = {}
>
> open(filename).each do |dict_word|
> if dict_word.length == 5 then
> dict_words << dict_word.chomp.strip
> end
> end
>
> wrds = dict_words
>
> dict_words.each do |wrd|
> current_wrd = wrd.split(//).sort.join
>
> anagrams = []
>
> wrds.each do |w|
> if wrd != w then
> if not current_wrd == w.split(//).sort.join then
> next
> end
> anagrams.push(w)
> end
> end
>
> if not anagrams.empty? then
> all_anagrams[wrd] = anagrams
> end
> end
>
> p all_anagrams
>
> end
> ---------------------------------------------------------
>
> thanks,

I just found a solution here:
http://www.joeygibson.com/blog/tech/ruby/...

that is exponentially faster than mine and finds all anagrams not just 4
letter words! I just need to understand it now.

---------------------------------------------------------
words = IO.readlines("/usr/share/dict/words")

anagrams = Hash.new([])

words.each do |word|
base = Array.new
word.chomp!.downcase!

word.each_byte do |byte|
base << byte.chr
end

base.sort!

anagrams[base.to_s] |= [word]
end

# Find the anagrams by eliminating those with only one word
anagrams.reject! {|k, v| v.length == 1}

values = anagrams.values.sort do |a, b|
b.length <=> a.length
end

File.open("anagrams.txt", "w") do |file|
values.each do |line|
file.puts(line.join(", "))
end
end

largest = anagrams.keys.max do |a, b|
a.length <=> b.length
end

puts "Total: #{anagrams.length}" #
puts "Largest set of anagrams: #{values[0].inspect}" #
print "Longest anagram: #{anagrams[largest].inspect} " #
puts "at #{largest.length} characters each"
---------------------------------------------------------

--
Mark

Rob Biedenharn

1/27/2007 2:57:00 PM

On Jan 27, 2007, at 8:25 AM, Mark Woodward wrote:
> On Sat, 27 Jan 2007 23:38:30 +1100, Mark Woodward wrote:
>
>> Hi all,
>>
>> I'm hoping for some critique of the code below.
>> It works but is slow and probably not 'the ruby way'
>>
>> ---------------------------------------------------------
>> # Anagrams
>> #
>> # Purpose: find all anagrams in /usr/share/dict/words
>> # for 4 letter words (length 5 because of \n)
>>
>> # path to the word file
>> filename = "/usr/share/dict/words"
>>
>> # check file exists and is readable
>> if File.file?(filename) && File.readable?(filename) then
>> dict_words = []
>> all_anagrams = {}
>>
>> open(filename).each do |dict_word|
>> if dict_word.length == 5 then
>> dict_words << dict_word.chomp.strip
>> end
>> end
>>
>> wrds = dict_words
>>
>> dict_words.each do |wrd|
>> current_wrd = wrd.split(//).sort.join
>>
>> anagrams = []
>>
>> wrds.each do |w|
>> if wrd != w then
>> if not current_wrd == w.split(//).sort.join then
>> next
>> end
>> anagrams.push(w)
>> end
>> end
>>
>> if not anagrams.empty? then
>> all_anagrams[wrd] = anagrams
>> end
>> end
>>
>> p all_anagrams
>>
>> end
>> ---------------------------------------------------------
>>
>> thanks,
>
> I just found a solution here:
> http://www.joeygibson.com/blog/tech/ruby/...
>
> that is exponentially faster than mine and finds all anagrams not
> just 4
> letter words! I just need to understand it now.
>
> ---------------------------------------------------------
> words = IO.readlines("/usr/share/dict/words")
Slurps in the entire dictionary as an array of lines (typically much
quicker than reading a line at a time even with buffering).
>
> anagrams = Hash.new([])
Build a new Hash where the default value is an empty array
>
> words.each do |word|
> base = Array.new
> word.chomp!.downcase!
>
> word.each_byte do |byte|
> base << byte.chr
> end
Get the individual bytes in the word (after having stripped any line
ending [chomp!] and standardizing the lettercase [downcase!])...
>
> base.sort!
...and put them in a standard order (by value, the bytes are just
integers). This is essentially the same as you were doing with
current_wrd (in fact, it might be interesting to see how much this
change affects the overall speed).
>
> anagrams[base.to_s] |= [word]
construct the string representation of the array of byte values to
use as the hash key, and then use the array union operator to include
the new word.

> end
>
> # Find the anagrams by eliminating those with only one word
> anagrams.reject! {|k, v| v.length == 1}
>
> values = anagrams.values.sort do |a, b|
> b.length <=> a.length
> end
Largest sets first.
>
> File.open("anagrams.txt", "w") do |file|
> values.each do |line|
> file.puts(line.join(", "))
> end
> end
>
> largest = anagrams.keys.max do |a, b|
> a.length <=> b.length
> end
>
> puts "Total: #{anagrams.length}" #
> puts "Largest set of anagrams: #{values[0].inspect}" #
> print "Longest anagram: #{anagrams[largest].inspect} " #
> puts "at #{largest.length} characters each"
> ---------------------------------------------------------
>
> --
> Mark

Does that make more sense to you? (I nearly ignored your message,
but your second post showed that you were really interested in
understanding, not just trying to get others to do your code review ;-)

-Rob

Rob Biedenharn http://agileconsult...
Rob@AgileConsultingLLC.com

Erik Veenstra

1/27/2007 6:30:00 PM

Here's a map-reduce-ish [1] solution.

I don't say it's better, or faster, or smaller (memory-wise).
It's just, uh, different...

(Sorry, couldn't resist, again... ;])

gegroet,
Erik V. - http://www.erikve...

[1] http://labs.google.com/papers/mapr...

----------------------------------------------------------------

WITHOUT THE TOTALS

File.open("anagrams.txt", "w") do |f|
File.readlines("/usr/share/dict/words").collect do |word|
word.chomp.downcase
end.collect do |word|
[word.scan(/./).sort, word]
end.inject({}) do |hash, (base, word)|
(hash[base] ||= []) << word ; hash
end.values.reject do |set|
set.length == 1
end.collect do |set|
set.sort
end.sort_by do |set|
[-set.length, set]
end.collect do |set|
set.join("\t")
end.each do |line|
f.puts(line)
end
end

----------------------------------------------------------------

WITH THE TOTALS

sets =
File.readlines("/usr/share/dict/words").collect do |word|
word.chomp.downcase
end.collect do |word|
[word.scan(/./).sort, word]
end.inject({}) do |hash, (base, word)|
(hash[base] ||= []) << word ; hash
end.values.reject do |set|
set.length == 1
end.collect do |set|
set.sort
end.sort_by do |set|
[-set.length, set]
end

largest = sets[0]
longest = sets.inject{|a, b| a[0].length > b[0].length ? a : b}

File.open("anagrams.txt", "w") do |f|
sets.each do |set|
f.puts(set.join("\t"))
end
end

puts "Total: %s" %
[sets.length]
puts "Largest set of anagrams: %s" %
[largest.inspect]
puts "Longest anagrams: %s at %s characters each" %
[longest.inspect, longest[0].length]

----------------------------------------------------------------

Mark Woodward

1/27/2007 11:19:00 PM

Hi Rob,

On Sat, 27 Jan 2007 23:56:42 +0900, Rob Biedenharn wrote:

> On Jan 27, 2007, at 8:25 AM, Mark Woodward wrote:
>> On Sat, 27 Jan 2007 23:38:30 +1100, Mark Woodward wrote:
>>
>>> Hi all,
>>>
>>> I'm hoping for some critique of the code below.
>>> It works but is slow and probably not 'the ruby way'
>>>
>>> ---------------------------------------------------------
>>> # Anagrams
>>> #
>>> # Purpose: find all anagrams in /usr/share/dict/words
>>> # for 4 letter words (length 5 because of \n)
>>>
>>> # path to the word file
>>> filename = "/usr/share/dict/words"
>>>
>>> # check file exists and is readable
>>> if File.file?(filename) && File.readable?(filename) then
>>> dict_words = []
>>> all_anagrams = {}
>>>
>>> open(filename).each do |dict_word|
>>> if dict_word.length == 5 then
>>> dict_words << dict_word.chomp.strip
>>> end
>>> end
>>>
>>> wrds = dict_words
>>>
>>> dict_words.each do |wrd|
>>> current_wrd = wrd.split(//).sort.join
>>>
>>> anagrams = []
>>>
>>> wrds.each do |w|
>>> if wrd != w then
>>> if not current_wrd == w.split(//).sort.join then
>>> next
>>> end
>>> anagrams.push(w)
>>> end
>>> end
>>>
>>> if not anagrams.empty? then
>>> all_anagrams[wrd] = anagrams
>>> end
>>> end
>>>
>>> p all_anagrams
>>>
>>> end
>>> ---------------------------------------------------------
>>>
>>> thanks,
>>
>> I just found a solution here:
>> http://www.joeygibson.com/blog/tech/ruby/...
>>
>> that is exponentially faster than mine and finds all anagrams not
>> just 4
>> letter words! I just need to understand it now.
>>
>> ---------------------------------------------------------
>> words = IO.readlines("/usr/share/dict/words")
> Slurps in the entire dictionary as an array of lines (typically much
> quicker than reading a line at a time even with buffering).
>>
>> anagrams = Hash.new([])
> Build a new Hash where the default value is an empty array
>>
>> words.each do |word|
>> base = Array.new
>> word.chomp!.downcase!
>>
>> word.each_byte do |byte|
>> base << byte.chr
>> end
> Get the individual bytes in the word (after having stripped any line
> ending [chomp!] and standardizing the lettercase [downcase!])...
>>
>> base.sort!
> ..and put them in a standard order (by value, the bytes are just
> integers). This is essentially the same as you were doing with
> current_wrd (in fact, it might be interesting to see how much this
> change affects the overall speed).
>>
>> anagrams[base.to_s] |= [word]
> construct the string representation of the array of byte values to
> use as the hash key, and then use the array union operator to include
> the new word.
>
>> end
>>
>> # Find the anagrams by eliminating those with only one word
>> anagrams.reject! {|k, v| v.length == 1}
>>
>> values = anagrams.values.sort do |a, b|
>> b.length <=> a.length
>> end
> Largest sets first.
>>
>> File.open("anagrams.txt", "w") do |file|
>> values.each do |line|
>> file.puts(line.join(", "))
>> end
>> end
>>
>> largest = anagrams.keys.max do |a, b|
>> a.length <=> b.length
>> end
>>
>> puts "Total: #{anagrams.length}" #
>> puts "Largest set of anagrams: #{values[0].inspect}" #
>> print "Longest anagram: #{anagrams[largest].inspect} " #
>> puts "at #{largest.length} characters each"
>> ---------------------------------------------------------
>>
>> --
>> Mark
>
> Does that make more sense to you? (I nearly ignored your message,
> but your second post showed that you were really interested in
> understanding, not just trying to get others to do your code review ;-)

Excellent, thanks

Although my solution worked I knew it wasn't 'rubyish' and also slow. I
then found the second solution and was proved right!! Erics' solution
looks even more 'adventurous' so I'll have to spend some time with the
Cookbook, Pickaxe, irb and ri to try and get a grasp of it.

I actually benchmarked the different ways to get the words from the dict
soon after finding the second solution and IO.readlines was much quicker.

I thought this would be a good way to learn Ruby. ie find a simple
exercise, have a go and then get opinions on better ways to attacked
it. Like others here, rubyquiz is out of my reach ATM and I found this
'kata(6)' at the pragmatic programmers.

I've spent way too much time reading books and not enough time coding!

PS - your inline comments are very helpful.

>
> -Rob
>
> Rob Biedenharn http://agileconsult...
> Rob@AgileConsultingLLC.com

cheers,

--
Mark

Mark Woodward

1/27/2007 11:22:00 PM

Hi Erik,

On Sun, 28 Jan 2007 03:30:21 +0900, Erik Veenstra wrote:

> Here's a map-reduce-ish [1] solution.
>
> I don't say it's better, or faster, or smaller (memory-wise).
> It's just, uh, different...
>
> (Sorry, couldn't resist, again... ;])
>
> gegroet,
> Erik V. - http://www.erikve...
>
> [1] http://labs.google.com/papers/mapr...
>
> ----------------------------------------------------------------
>
> WITHOUT THE TOTALS
>
> File.open("anagrams.txt", "w") do |f|
> File.readlines("/usr/share/dict/words").collect do |word|
> word.chomp.downcase
> end.collect do |word|
> [word.scan(/./).sort, word]
> end.inject({}) do |hash, (base, word)|
> (hash[base] ||= []) << word ; hash
> end.values.reject do |set|
> set.length == 1
> end.collect do |set|
> set.sort
> end.sort_by do |set|
> [-set.length, set]
> end.collect do |set|
> set.join("\t")
> end.each do |line|
> f.puts(line)
> end
> end
>
> ----------------------------------------------------------------
>
> WITH THE TOTALS
>
> sets =
> File.readlines("/usr/share/dict/words").collect do |word|
> word.chomp.downcase
> end.collect do |word|
> [word.scan(/./).sort, word]
> end.inject({}) do |hash, (base, word)|
> (hash[base] ||= []) << word ; hash
> end.values.reject do |set|
> set.length == 1
> end.collect do |set|
> set.sort
> end.sort_by do |set|
> [-set.length, set]
> end
>
> largest = sets[0]
> longest = sets.inject{|a, b| a[0].length > b[0].length ? a : b}
>
> File.open("anagrams.txt", "w") do |f|
> sets.each do |set|
> f.puts(set.join("\t"))
> end
> end
>
> puts "Total: %s" %
> [sets.length]
> puts "Largest set of anagrams: %s" %
> [largest.inspect]
> puts "Longest anagrams: %s at %s characters each" %
> [longest.inspect, longest[0].length]
>
> ----------------------------------------------------------------

wow! now that's going to take some understanding!
Thanks for the homework over the next few nights ;-)

cheers,

--
Mark

William James

1/28/2007 2:07:00 AM

On Jan 27, 12:30 pm, "Erik Veenstra" <erikv...@gmail.com> wrote:
> Here's a map-reduce-ish [1] solution.
>
> I don't say it's better, or faster, or smaller (memory-wise).
> It's just, uh, different...
>
> (Sorry, couldn't resist, again... ;])
>
> gegroet,
> Erik V. -http://www.erikve...
>
> [1]http://labs.google.com/papers/mapr...
>
> ----------------------------------------------------------------
>
> WITHOUT THE TOTALS
>
> File.open("anagrams.txt", "w") do |f|
> File.readlines("/usr/share/dict/words").collect do |word|
> word.chomp.downcase
> end.collect do |word|
> [word.scan(/./).sort, word]
> end.inject({}) do |hash, (base, word)|
> (hash[base] ||= []) << word ; hash
> end.values.reject do |set|
> set.length == 1
> end.collect do |set|
> set.sort
> end.sort_by do |set|
> [-set.length, set]
> end.collect do |set|
> set.join("\t")
> end.each do |line|
> f.puts(line)
> end
> end

Do you know what a one-to-one mapping is?
It's senseless to use "collect"; use "map" instead.

open( 'anagrams.txt', 'w' ) { |file|
IO.readlines( 'words' ).inject({}){ |hash,word|
word.strip!.downcase!
(hash[word.split(//).sort.join] ||=[]) << word; hash }.
values.reject{|list| list.size < 2}.
map{|list| list.sort}.sort_by{|list| [ -list.size, list ] }.
each{|list| file.puts list.join( "\t" ) } }

Erik Veenstra

1/28/2007 10:37:00 AM

> Do you know what a one-to-one mapping is? It's senseless to
> use "collect"; use "map" instead.

Do you know they're exactly the same? It's senseless to use
"map" instead of "collect".

gegroet,
Erik V. - http://www.erikve...

----------------------------------------------------------------

$ grep *.c -wie rb_define_method | grep -wie collect -wie map
array.c: rb_define_method(rb_cArray, "collect", rb_ary_collect,
0);
array.c: rb_define_method(rb_cArray, "collect!",
rb_ary_collect_bang, 0);
array.c: rb_define_method(rb_cArray, "map", rb_ary_collect, 0);
array.c: rb_define_method(rb_cArray, "map!", rb_ary_collect_bang,
0);
enum.c: rb_define_method(rb_mEnumerable,"collect", enum_collect,
0);
enum.c: rb_define_method(rb_mEnumerable,"map", enum_collect, 0);

----------------------------------------------------------------

comp.lang.ruby

Review of code please

Mark Woodward

Mark Woodward

Rob Biedenharn

Erik Veenstra

Mark Woodward

Mark Woodward

William James

Erik Veenstra

x Login to ForumsZone