[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

GC and low file performance when large array is allocated

geert.fannes

10/13/2004 9:44:00 PM

Hello,

I noticed that ruby's disc performance drops drastically when a large
array is allocated. I think it has to do with garbage collection since
the performance increases again by disabling the garbage collection. I
created a small test program to illustrate the problem:

#
#begin of program
#
allocateBefore=true
useFileLoop=true
disableGC=false

GC.disable if disableGC

#create a file containing 100000 lines of 'test'
File.open('testfile','w'){|fo| 100000.times{fo.puts 'test'}}

largeArray=Array.new(20000000) if allocateBefore

if useFileLoop
File.open('testfile') do |fi|
fi.each{|line|}
end
else
1000000.times{|i|}
end

largeArray=Array.new(20000000) if !allocateBefore
#
#end of program
#

On my home pc, the above program takes 3.225 sec. If I allocate the
large array AFTER the fi.each-loop by setting allocateBefore=false, it
takes only 0.467 sec. The same good performance occurs when I disable
the garbage collection by setting disableGC=true. Unfortunately,
disabling GC is not an option in my real application since my file is
a lot larger and all my memory gets consumed very fast.

If I play with the allocateBefore and disableGC when the
1000000.times-loop is enabled (by setting useFileLoop=false), I don't
get this difference anymore.

Any idea what is going on here? How can I achieve a good file
performance with large arrays in memory?

Greets,
Geert Fannes.
5 Answers

Kent Sibilev

10/13/2004 11:07:00 PM

0

If you run Unix, maybe you should consider using mmap module?

Cheers,
Kent.
On Oct 13, 2004, at 5:44 PM, Geert Fannes wrote:

> Hello,
>
> I noticed that ruby's disc performance drops drastically when a large
> array is allocated. I think it has to do with garbage collection since
> the performance increases again by disabling the garbage collection. I
> created a small test program to illustrate the problem:
>
> #
> #begin of program
> #
> allocateBefore=true
> useFileLoop=true
> disableGC=false
>
> GC.disable if disableGC
>
> #create a file containing 100000 lines of 'test'
> File.open('testfile','w'){|fo| 100000.times{fo.puts 'test'}}
>
> largeArray=Array.new(20000000) if allocateBefore
>
> if useFileLoop
> File.open('testfile') do |fi|
> fi.each{|line|}
> end
> else
> 1000000.times{|i|}
> end
>
> largeArray=Array.new(20000000) if !allocateBefore
> #
> #end of program
> #
>
> On my home pc, the above program takes 3.225 sec. If I allocate the
> large array AFTER the fi.each-loop by setting allocateBefore=false, it
> takes only 0.467 sec. The same good performance occurs when I disable
> the garbage collection by setting disableGC=true. Unfortunately,
> disabling GC is not an option in my real application since my file is
> a lot larger and all my memory gets consumed very fast.
>
> If I play with the allocateBefore and disableGC when the
> 1000000.times-loop is enabled (by setting useFileLoop=false), I don't
> get this difference anymore.
>
> Any idea what is going on here? How can I achieve a good file
> performance with large arrays in memory?
>
> Greets,
> Geert Fannes.
>



Charles Mills

10/14/2004 2:43:00 AM

0

On Oct 13, 2004, at 2:44 PM, Geert Fannes wrote:
> I noticed that ruby's disc performance drops drastically when a large
> array is allocated. I think it has to do with garbage collection since
> the performance increases again by disabling the garbage collection. I
> created a small test program to illustrate the problem:
>
> #
> #begin of program
> #
> allocateBefore=true
> useFileLoop=true
> disableGC=false
>
> GC.disable if disableGC
>
> #create a file containing 100000 lines of 'test'
> File.open('testfile','w'){|fo| 100000.times{fo.puts 'test'}}
>
> largeArray=Array.new(20000000) if allocateBefore
>
> if useFileLoop
> File.open('testfile') do |fi|
> fi.each{|line|}
> end
> else
> 1000000.times{|i|}
> end
>
> largeArray=Array.new(20000000) if !allocateBefore
> #
> #end of program
> #
Here is the Shark (profiler) output for allocateBefore=false, with
everything else the same.
Took about 1.3 seconds.
9.9% 9.9% mach_kernel ml_restore
6.0% 6.0% ruby memfill
5.8% 5.8% ruby rb_yield_0
4.7% 4.7% mach_kernel ml_set_interrupts_enabled
4.6% 4.6% ruby saveFP
4.5% 4.5% ruby rb_call0
4.3% 4.3% ruby rb_eval
3.5% 3.5% libSystem.B.dylib szone_malloc
3.3% 3.3% libSystem.B.dylib _setjmp
2.8% 2.8% libSystem.B.dylib szone_free
2.6% 2.6% libSystem.B.dylib __error
2.0% 2.0% ruby rb_newobj
2.0% 2.0% mach_kernel hw_add_map
1.6% 1.6% ruby rb_call
1.6% 1.6% ruby new_dvar
1.3% 1.3% ruby rb_funcall
1.2% 1.2% mach_kernel tws_traverse_address_hash_list
1.2% 1.2% ruby st_lookup
1.2% 1.2% ruby restFP
1.1% 1.1% ruby obj_free
1.1% 1.1% ruby call_cfunc
1.1% 1.1% commpage __memcpy
1.0% 1.0% ruby io_write
1.0% 1.0% libSystem.B.dylib fwrite
1.0% 1.0% libSystem.B.dylib __sfvwrite
0.9% 0.9% ruby rb_io_puts
0.9% 0.9% ruby rb_io_fwrite
0.8% 0.8% mach_kernel vm_fault


Output for allocateBefore=true and everything else the same.
Took about 9.7 seconds.
45.9% 45.9% ruby gc_mark
42.1% 42.1% ruby gc_mark_children
1.1% 1.1% mach_kernel ml_restore
1.1% 1.1% mach_kernel ml_set_interrupts_enabled
0.6% 0.6% ruby memfill
0.6% 0.6% ruby rb_eval
0.5% 0.5% ruby rb_yield_0
0.5% 0.5% ruby rb_call0
0.4% 0.4% ruby saveFP
0.4% 0.4% libSystem.B.dylib szone_malloc
0.3% 0.3% libSystem.B.dylib szone_free
0.3% 0.3% libSystem.B.dylib _setjmp
0.3% 0.3% ruby rb_call
0.3% 0.3% ruby call_cfunc
0.2% 0.2% commpage __memcpy
0.2% 0.2% mach_kernel hw_add_map
0.2% 0.2% ruby rb_newobj
0.2% 0.2% libSystem.B.dylib __error
0.2% 0.2% ruby restFP
0.2% 0.2% ruby io_write
0.2% 0.2% ruby obj_free
0.2% 0.2% libSystem.B.dylib __sfvwrite
0.1% 0.1% mach_kernel vm_page_grab
0.1% 0.1% ruby st_foreach
0.1% 0.1% ruby rb_funcall
0.1% 0.1% mach_kernel vm_fault
0.1% 0.1% ruby st_lookup

Definitely a performance hit. Pretty interesting.
-Charlie



geert.fannes

10/14/2004 6:51:00 AM

0

Hello,

I played some more with the test program and apparently it has nothing
to do with the file access. The program below is a simplified version,
which is more to the point. If I exchange the string allocation t="t"
with t=1, there is no performance drop anymore.

#
#begin of program
#
allocateBefore=true
disableGC=false

GC.disable if disableGC

largeArray=Array.new(20000000) if allocateBefore

100000.times{|i|t="t"}

largeArray=Array.new(20000000) if !allocateBefore
#
#end of program
#

Any idea why the string allocation (and possibly deallocation) takes
so much more time when there is a large array in memory? Can I destroy
an object manually? This could be helpfull in combination with
disabling the garbage collection.

Greets,
Geert Fannes.
geert.fannes@gmail.com (Geert Fannes) wrote in message news:<bc64b7df.0410131344.16c4856e@posting.google.com>...
> Hello,
>
> I noticed that ruby's disc performance drops drastically when a large
> array is allocated. I think it has to do with garbage collection since
> the performance increases again by disabling the garbage collection. I
> created a small test program to illustrate the problem:
>
> #
> #begin of program
> #
> allocateBefore=true
> useFileLoop=true
> disableGC=false
>
> GC.disable if disableGC
>
> #create a file containing 100000 lines of 'test'
> File.open('testfile','w'){|fo| 100000.times{fo.puts 'test'}}
>
> largeArray=Array.new(20000000) if allocateBefore
>
> if useFileLoop
> File.open('testfile') do |fi|
> fi.each{|line|}
> end
> else
> 1000000.times{|i|}
> end
>
> largeArray=Array.new(20000000) if !allocateBefore
> #
> #end of program
> #
>
> On my home pc, the above program takes 3.225 sec. If I allocate the
> large array AFTER the fi.each-loop by setting allocateBefore=false, it
> takes only 0.467 sec. The same good performance occurs when I disable
> the garbage collection by setting disableGC=true. Unfortunately,
> disabling GC is not an option in my real application since my file is
> a lot larger and all my memory gets consumed very fast.
>
> If I play with the allocateBefore and disableGC when the
> 1000000.times-loop is enabled (by setting useFileLoop=false), I don't
> get this difference anymore.
>
> Any idea what is going on here? How can I achieve a good file
> performance with large arrays in memory?
>
> Greets,
> Geert Fannes.

Ruben Vandeginste

10/14/2004 7:13:00 AM

0


At Thu, 14 Oct 2004 15:54:32 +0900,
Geert Fannes wrote:
>
> Hello,
>
> I played some more with the test program and apparently it has nothing
> to do with the file access. The program below is a simplified version,
> which is more to the point. If I exchange the string allocation t="t"
> with t=1, there is no performance drop anymore.
>
> #
> #begin of program
> #
> allocateBefore=true
> disableGC=false
>
> GC.disable if disableGC
>
> largeArray=Array.new(20000000) if allocateBefore
>
> 100000.times{|i|t="t"}
>
> largeArray=Array.new(20000000) if !allocateBefore
> #
> #end of program
> #
>
> Any idea why the string allocation (and possibly deallocation) takes
> so much more time when there is a large array in memory? Can I destroy
> an object manually? This could be helpfull in combination with
> disabling the garbage collection.
>

My guess is that in the loop 100000.times{|i|t="t"}, the garbage
collector will run out of memory, maybe once, maybe even more. And
when it runs out of memory, it will have to mark/read/follow all cells
in "largeArray", and since that array is very large, i think that's
causing the slowdown.

Try the following and you'll see it's faster:

#
#begin of program
#
allocateBefore=true
disableGC=false

GC.disable if disableGC

largeArray=Array.new(20000000) if allocateBefore
largeArray=0 # explicitly setting to 0 so that the
# gc will not need to mark it
100000.times{|i|t="t"}

largeArray=Array.new(20000000) if !allocateBefore
#
#end of program
#

Ruben

Ruben Vandeginste

10/14/2004 7:23:00 AM

0

At Thu, 14 Oct 2004 16:13:22 +0900,
Ruben wrote:
>
> My guess is that in the loop 100000.times{|i|t="t"}, the garbage
> collector will run out of memory, maybe once, maybe even more. And
> when it runs out of memory, it will have to mark/read/follow all cells
> in "largeArray", and since that array is very large, i think that's
> causing the slowdown.
>
> Try the following and you'll see it's faster:

Sorry, disregard this. I somehow messed up the timings from some
tests.

Ruben