Robert Klemme
6/11/2007 7:01:00 AM
On 11.06.2007 07:41, Anthony Martinez wrote:
> So, I was tinkering with ways to build a hash out of transforming an
> array, knowing the standard/idiomatic
>
> id_list = [:test1, :test2, :test3]
> id_list.inject({}) { |a,e| a[e]=e.object_id ; a }
>
> I also decided to try something like this:
>
> Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten]
>
> and further (attempt to) optimize it via
> Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!]
> and
> Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!]
>
> Running this via Benchmark#bmbm gives pretty interesting, and to me,
> unexpected, results (on a 3.2 GHz P4, 1GB of RAM, FC5 with ruby 1.8.4)
>
> require 'benchmark'
> id_list = (1..1_000_000).to_a
> Benchmark::bmbm do |x|
> x.report("inject") { id_list.inject({}) { |a,e| a[e] = e.object_id ; a} }
> x.report("non-bang") { Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten] }
> x.report("bang") { Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!] }
> x.report("two-bang") { Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!] }
> end
>
> Rehearsal --------------------------------------------
> inject 16.083333 0.033333 16.116667 ( 9.670747)
> non-bang 1657.050000 1.800000 1658.850000 (995.425642)
> bang 1593.716667 0.016667 1593.733333 (956.334565)
> two-bang 1604.816667 1.350000 1606.166667 (963.803356)
> -------------------------------- total: 4874.866667sec
>
> user system total real
> inject 5.183333 0.000000 5.183333 ( 3.102379)
> non-bang zsh: segmentation fault ruby
>
> Ow?
>
> Also, I just thought of a similar way to accomplish the same thing:
>
> x.report("zip") { Hash[ *id_list.zip(id_list.collect {|e| e.object_id})] }
>
> Array#collect! won't work right with this, of course, but it seems to
> have equally-bad performance. Is Array#inject just optimized for this,
> or something?
The reason why you are seeing this (performance as well as timing) is
most likely caused by the different approach. When you use #inject you
just create one copy of the Array (the Hash). When you use #collect you
create at least one additional copy of the large array plus a ton of two
element arrays. That's way less efficient considering memory usage and
GC. You'll probably see much different results if your input array is
much shorter (try with 10 or 100 elements).
Kind regards
robert