[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Benchmark segfault [Was: Array#inject to create a hash versus Hash[*array.collect{}.flatten] ]

Michal Suchanek

6/12/2007 9:52:00 AM

On 11/06/07, Robert Klemme <shortcutter@googlemail.com> wrote:
> On 11.06.2007 07:41, Anthony Martinez wrote:
> > So, I was tinkering with ways to build a hash out of transforming an
> > array, knowing the standard/idiomatic
> >
> > id_list = [:test1, :test2, :test3]
> > id_list.inject({}) { |a,e| a[e]=e.object_id ; a }
> >
> > I also decided to try something like this:
> >
> > Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten]
> >
> > and further (attempt to) optimize it via
> > Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!]
> > and
> > Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!]
> >
> > Running this via Benchmark#bmbm gives pretty interesting, and to me,
> > unexpected, results (on a 3.2 GHz P4, 1GB of RAM, FC5 with ruby 1.8.4)
> >
> > require 'benchmark'
> > id_list = (1..1_000_000).to_a
> > Benchmark::bmbm do |x|
> > x.report("inject") { id_list.inject({}) { |a,e| a[e] = e.object_id ; a} }
> > x.report("non-bang") { Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten] }
> > x.report("bang") { Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!] }
> > x.report("two-bang") { Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!] }
> > end
> >
> > Rehearsal --------------------------------------------
> > inject 16.083333 0.033333 16.116667 ( 9.670747)
> > non-bang 1657.050000 1.800000 1658.850000 (995.425642)
> > bang 1593.716667 0.016667 1593.733333 (956.334565)
> > two-bang 1604.816667 1.350000 1606.166667 (963.803356)
> > -------------------------------- total: 4874.866667sec
> >
> > user system total real
> > inject 5.183333 0.000000 5.183333 ( 3.102379)
> > non-bang zsh: segmentation fault ruby
> >
> > Ow?
> >
> > Also, I just thought of a similar way to accomplish the same thing:
> >
> > x.report("zip") { Hash[ *id_list.zip(id_list.collect {|e| e.object_id})] }
> >
> > Array#collect! won't work right with this, of course, but it seems to
> > have equally-bad performance. Is Array#inject just optimized for this,
> > or something?
>
> The reason why you are seeing this (performance as well as timing) is
> most likely caused by the different approach. When you use #inject you
> just create one copy of the Array (the Hash). When you use #collect you
> create at least one additional copy of the large array plus a ton of two
> element arrays. That's way less efficient considering memory usage and
> GC. You'll probably see much different results if your input array is
> much shorter (try with 10 or 100 elements).
>

I also get a segfault (with 1.8.5 on Gentoo):

ruby -w test.rb
Rehearsal --------------------------------------------
inject 7.210000 0.870000 8.080000 ( 13.011414)
non-bang 4620.790000 179.730000 4800.520000 (6029.976497)
bang 4586.140000 200.530000 4786.670000 (5970.267190)
two-bang 4599.560000 268.080000 4867.640000 (6035.687979)
------------------------------- total: 14462.910000sec

user system total real
inject 10.740000 1.990000 12.730000 ( 15.500874)
non-bang Segmentation fault
hramrach@hp-tc2110:11(0) 06120053 16]~ $ ruby -v
ruby 1.8.5 (2006-12-04 patchlevel 2) [i686-linux]

Of course, it would be better to test with 1.8.6 but it takes quite
some time to retest ;-)

Thanks

Michal

6 Answers

Robert Klemme

6/12/2007 11:12:00 AM

0

On 12.06.2007 11:52, Michal Suchanek wrote:
> On 11/06/07, Robert Klemme <shortcutter@googlemail.com> wrote:
>> On 11.06.2007 07:41, Anthony Martinez wrote:
>> > So, I was tinkering with ways to build a hash out of transforming an
>> > array, knowing the standard/idiomatic
>> >
>> > id_list = [:test1, :test2, :test3]
>> > id_list.inject({}) { |a,e| a[e]=e.object_id ; a }
>> >
>> > I also decided to try something like this:
>> >
>> > Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten]
>> >
>> > and further (attempt to) optimize it via
>> > Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!]
>> > and
>> > Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!]
>> >
>> > Running this via Benchmark#bmbm gives pretty interesting, and to me,
>> > unexpected, results (on a 3.2 GHz P4, 1GB of RAM, FC5 with ruby 1.8.4)
>> >
>> > require 'benchmark'
>> > id_list = (1..1_000_000).to_a
>> > Benchmark::bmbm do |x|
>> > x.report("inject") { id_list.inject({}) { |a,e| a[e] =
>> e.object_id ; a} }
>> > x.report("non-bang") { Hash[ *id_list.collect { |e|
>> [e,e.object_id]}.flatten] }
>> > x.report("bang") { Hash[ *id_list.collect { |e|
>> [e,e.object_id]}.flatten!] }
>> > x.report("two-bang") { Hash[ *id_list.collect! { |e|
>> [e,e.object_id]}.flatten!] }
>> > end
>> >
>> > Rehearsal --------------------------------------------
>> > inject 16.083333 0.033333 16.116667 ( 9.670747)
>> > non-bang 1657.050000 1.800000 1658.850000 (995.425642)
>> > bang 1593.716667 0.016667 1593.733333 (956.334565)
>> > two-bang 1604.816667 1.350000 1606.166667 (963.803356)
>> > -------------------------------- total: 4874.866667sec
>> >
>> > user system total real
>> > inject 5.183333 0.000000 5.183333 ( 3.102379)
>> > non-bang zsh: segmentation fault ruby
>> >
>> > Ow?
>> >
>> > Also, I just thought of a similar way to accomplish the same thing:
>> >
>> > x.report("zip") { Hash[ *id_list.zip(id_list.collect {|e|
>> e.object_id})] }
>> >
>> > Array#collect! won't work right with this, of course, but it seems to
>> > have equally-bad performance. Is Array#inject just optimized for this,
>> > or something?
>>
>> The reason why you are seeing this (performance as well as timing) is
>> most likely caused by the different approach. When you use #inject you
>> just create one copy of the Array (the Hash). When you use #collect you
>> create at least one additional copy of the large array plus a ton of two
>> element arrays. That's way less efficient considering memory usage and
>> GC. You'll probably see much different results if your input array is
>> much shorter (try with 10 or 100 elements).
>>
>
> I also get a segfault (with 1.8.5 on Gentoo):
>
> ruby -w test.rb
> Rehearsal --------------------------------------------
> inject 7.210000 0.870000 8.080000 ( 13.011414)
> non-bang 4620.790000 179.730000 4800.520000 (6029.976497)
> bang 4586.140000 200.530000 4786.670000 (5970.267190)
> two-bang 4599.560000 268.080000 4867.640000 (6035.687979)
> ------------------------------- total: 14462.910000sec
>
> user system total real
> inject 10.740000 1.990000 12.730000 ( 15.500874)
> non-bang Segmentation fault
> hramrach@hp-tc2110:11(0) 06120053 16]~ $ ruby -v
> ruby 1.8.5 (2006-12-04 patchlevel 2) [i686-linux]
>
> Of course, it would be better to test with 1.8.6 but it takes quite
> some time to retest ;-)
>
> Thanks
>
> Michal

I believe it's the star operator:

13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
-e:1: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i386-cygwin]

Aborted (core dumped)
13:10:39 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
1000000
13:11:09 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
10000000
-e:1: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i386-cygwin]

Aborted (core dumped)
13:11:14 [Temp]: uname -a
CYGWIN_NT-5.2-WOW64 PDBXPWSRK38 1.5.24(0.156/4/2) 2007-01-31 10:57 i686
Cygwin
13:11:36 [Temp]: ruby --version
ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]

Kind regards

robert

Florian Frank

6/12/2007 2:17:00 PM

0

Robert Klemme wrote:
> I believe it's the star operator:
>
> 13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
> 13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
> -e:1: [BUG] Segmentation fault
> ruby 1.8.6 (2007-03-13) [i386-cygwin]
>
> Aborted (core dumped)
> 13:10:39 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
> 1000000
> 13:11:09 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
> 10000000
> -e:1: [BUG] Segmentation fault
> ruby 1.8.6 (2007-03-13) [i386-cygwin]
>
> Aborted (core dumped)
> 13:11:14 [Temp]: uname -a
> CYGWIN_NT-5.2-WOW64 PDBXPWSRK38 1.5.24(0.156/4/2) 2007-01-31 10:57 i686
> Cygwin
> 13:11:36 [Temp]: ruby --version
> ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]

The reason for this problem is, that the stack is used to pass A LOT of
arguments to the Hash.[] method. If the stack size of the executed process isn't
big enough (check your resource limits), this can lead to crashes.

--
Florian Frank

Robert Klemme

6/12/2007 2:24:00 PM

0

On 12.06.2007 16:16, Florian Frank wrote:
> Robert Klemme wrote:
>> I believe it's the star operator:
>>
>> 13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
>> 13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
>> -e:1: [BUG] Segmentation fault
>> ruby 1.8.6 (2007-03-13) [i386-cygwin]
>>
>> Aborted (core dumped)
>> 13:10:39 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
>> 1000000
>> 13:11:09 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
>> 10000000
>> -e:1: [BUG] Segmentation fault
>> ruby 1.8.6 (2007-03-13) [i386-cygwin]
>>
>> Aborted (core dumped)
>> 13:11:14 [Temp]: uname -a
>> CYGWIN_NT-5.2-WOW64 PDBXPWSRK38 1.5.24(0.156/4/2) 2007-01-31 10:57 i686
>> Cygwin
>> 13:11:36 [Temp]: ruby --version
>> ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]
>
> The reason for this problem is, that the stack is used to pass A LOT of
> arguments to the Hash.[] method. If the stack size of the executed process isn't
> big enough (check your resource limits), this can lead to crashes.

That was what I was guessing. However, I haven't enough insight to
verify. There is at least one point that causes me doubt: the array can
be handled and stored like any other array, so it should be on the heap
(or it is implicitly moved there).

Kind regards

robert

Mauricio Fernández

6/12/2007 2:40:00 PM

0

On Tue, Jun 12, 2007 at 11:25:04PM +0900, Robert Klemme wrote:
> On 12.06.2007 16:16, Florian Frank wrote:
> >Robert Klemme wrote:
> >>I believe it's the star operator:
> >>
> >>13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
> >>13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
> >>-e:1: [BUG] Segmentation fault
> >>ruby 1.8.6 (2007-03-13) [i386-cygwin]
[...]
> >The reason for this problem is, that the stack is used to pass A LOT of
> >arguments to the Hash.[] method. If the stack size of the executed process
> >isn't
> >big enough (check your resource limits), this can lead to crashes.
>
> That was what I was guessing. However, I haven't enough insight to
> verify. There is at least one point that causes me doubt: the array can
> be handled and stored like any other array, so it should be on the heap
> (or it is implicitly moved there).

$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 2091000
-e:1:in `[]': stack level too deep (SystemStackError)
from -e:1
$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 2092000
Segmentation fault
$ ulimit -s
8192
$ ulimit -s 16384
$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 2092000
$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 4184000
-e:1:in `[]': stack level too deep (SystemStackError)
from -e:1
$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 4190000
Segmentation fault

--
Mauricio Fernandez - http://eige... - singular Ruby

Florian Frank

6/12/2007 3:51:00 PM

0

Robert Klemme wrote:
> That was what I was guessing. However, I haven't enough insight to
> verify. There is at least one point that causes me doubt: the array can
> be handled and stored like any other array, so it should be on the heap
> (or it is implicitly moved there).

Yes, the implementation of this is in eval.c's rb_call0, after that the
arguments are passed along via the argc/argv function arguments. Ruby also tries
to throw a SystemStackError in rb_call0 like Mauricio has shown, but this isn't
guaranteed to work in all cases.

--
Florian Frank

Nobuyoshi Nakada

6/13/2007 4:40:00 AM

0

Hi,

At Wed, 13 Jun 2007 00:50:30 +0900,
Florian Frank wrote in [ruby-talk:255348]:
> Yes, the implementation of this is in eval.c's rb_call0, after that the
> arguments are passed along via the argc/argv function arguments. Ruby also tries
> to throw a SystemStackError in rb_call0 like Mauricio has shown, but this isn't
> guaranteed to work in all cases.

Actually, the definition of TMP_ALLOC. Try replacing #ifdef
C_ALLOCA with #if 1.

--
Nobu Nakada