Mohit Sindhwani
10/28/2007 6:15:00 PM
Sean O'Halpin wrote:
> On 10/28/07, Mohit Sindhwani <mo_mail@onghu.com> wrote:
>
>> I so have to get the hang of inject, flatten and map.
>>
>> Cheers,
>> Mohit.
>> 10/28/2007 | 9:16 PM.
>>
>
> Hi,
>
> They are definitely worth looking into - inject in particular is a
> powerful tool (Robert Klemme can make it do anything!). However, the
> following benchmark shows that a slight modification of your approach
> is actually pretty efficient. (The modification is to store the
> duplicates in a hash rather than an array so you can return the list
> of duplicates using Hash#keys).
>
> Regards,
> Sean
>
> # Mohit Sindhwani (with slight adjustment)
> def duplicates_1(array)
> seen = { }
> duplicates = { }
> array.each {|item| seen.key?(item) ? duplicates[item] = true :
> seen[item] = true}
> duplicates.keys
> end
>
> # Robert Klemme
> def duplicates_2(array)
> array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
> end
>
> # from facets
> def duplicates_3(array)
> array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys
> end
>
> require 'benchmark'
>
> def do_benchmark(title, n, methods, *args, &block)
> puts '-' * 40
> puts title
> puts '-' * 40
> Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
> methods.each do |meth|
> x.report(meth.to_s) { n.times do send(meth, *args, &block) end }
> end
> end
> end
>
> # get some data (Ubuntu specific I guess - YMMV)
> array = File.read('/etc/dictionaries-common/words').split(/\n/)
>
> # test w/o dups
> do_benchmark('no duplicates', 10, [:duplicates_1, :duplicates_2,
> :duplicates_3], array)
>
> # create some duplicates
> array = array[0..999] * 100
> do_benchmark('duplicates', 10, [:duplicates_1, :duplicates_2,
> :duplicates_3], array)
>
> __END__
> $ ruby bm-duplicates.rb
> ----------------------------------------
> no duplicates
> ----------------------------------------
> user system total real
> duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)
> duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)
> duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)
> ----------------------------------------
> duplicates
> ----------------------------------------
> user system total real
> duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)
> duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)
> duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)
>
>
>
>
Thanks Sean! Makes me feel quite nice about it.
So, hashes are faster than arrays?
Cheers,
Mohit.
10/29/2007 | 2:13 AM.