[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

You thoughts/philosphies on manual garbage collection

dkmd_nielsen

3/8/2007 10:13:00 PM

The process that initiated my message earlier (about deleting array
elements) is a rather long running process of rebuilding and
reconfiguring parameter files. There hundreds of files, each with as
many as 22,000 parameters to processed. For example, four small test
files ran in about two minutes. There is a ton of string manipulation
going on, which probably translated into lots of trailing string parts
and pointer laying around RAM...clogging it up. I was thinking of
manually initiating garbage collection after every five or ten files
processed. Is that a smart thing?

What are yours thoughts on manually initiated garbage collection?
What kinds of practices result in bits and pieces of objects and
pointers being left laying around in the ether of RAM? Are there
tools that help see what happens to RAM while a process runs, like a
debugger does with variables?

Thanks for everything
dvn

5 Answers

Ara.T.Howard

3/8/2007 10:18:00 PM

0

Robert Klemme

3/9/2007 10:08:00 AM

0

On 08.03.2007 23:18, ara.t.howard@noaa.gov wrote:
> On Fri, 9 Mar 2007, dkmd_nielsen wrote:
>
>> The process that initiated my message earlier (about deleting array
>> elements) is a rather long running process of rebuilding and
>> reconfiguring parameter files. There hundreds of files, each with as
>> many as 22,000 parameters to processed. For example, four small test
>> files ran in about two minutes. There is a ton of string manipulation
>> going on, which probably translated into lots of trailing string parts
>> and pointer laying around RAM...clogging it up. I was thinking of
>> manually initiating garbage collection after every five or ten files
>> processed. Is that a smart thing?

To OP: generally "manual" GC is considered bad since it interferes with
the automatic mechanism.

>> What are yours thoughts on manually initiated garbage collection?
>> What kinds of practices result in bits and pieces of objects and
>> pointers being left laying around in the ether of RAM? Are there
>> tools that help see what happens to RAM while a process runs, like a
>> debugger does with variables?
>>
>> Thanks for everything
>> dvn
>
> if you can fork - that's the best - then you just let each child's death
> clean
> up that sub-segment of work's memory.

Also, forking has the added advantage of better utilizing multi core CPU's.

If you do encounter excessive memory usage then you should

a) make sure you do not hold onto stuff longer than needed

b) check your algorithms for inefficient dealing with objects; since you
mention string processing, this is a typical gotcha:

s += "foo" # creates a new string
s << "foo" # just appends to s

Another one

a=[]
a += ["foo", "bar"] # creates another array
a << "foo" << "bar" # just appends
a.concat ["foo", "bar"] # just appends

c) If files you are processing are large then you might also try to do
some kind of stream processing where you do not have to keep the whole
file's content in memory (if that's applicable to your problem domain).

Kind regards

robert

Joel VanderWerf

3/11/2007 7:26:00 PM

0

ara.t.howard@noaa.gov wrote:
> On Fri, 9 Mar 2007, dkmd_nielsen wrote:
>
>> The process that initiated my message earlier (about deleting array
>> elements) is a rather long running process of rebuilding and
>> reconfiguring parameter files. There hundreds of files, each with as
>> many as 22,000 parameters to processed. For example, four small test
>> files ran in about two minutes. There is a ton of string manipulation
>> going on, which probably translated into lots of trailing string parts
>> and pointer laying around RAM...clogging it up. I was thinking of
>> manually initiating garbage collection after every five or ten files
>> processed. Is that a smart thing?
>>
>> What are yours thoughts on manually initiated garbage collection?
>> What kinds of practices result in bits and pieces of objects and
>> pointers being left laying around in the ether of RAM? Are there
>> tools that help see what happens to RAM while a process runs, like a
>> debugger does with variables?
>>
>> Thanks for everything
>> dvn
>
> if you can fork - that's the best - then you just let each child's death
> clean
> up that sub-segment of work's memory.

One caution: mark-and-sweep GC and fork don't always play well together,
in terms of sharing memory pages. The mark algorithm needs to touch all
live objects in the heap. The child inherits the parent's heap, with
copy on write. If the parent has a large heap, and the child does a GC,
all those pages are copied into the child's address space. Memory
usage will scale badly as the number of child processes grows. (Perhaps
you factor your process into one child for each of the hundreds of files?)

It can be a good idea to GC.disable in the child, in some cases:

- parent has large heap, and

- child lifespan and allocation rate are such that is does not need to GC

Some benchmarks:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Daniel DeLorme

3/13/2007 11:54:00 AM

0

Joel VanderWerf wrote:
> One caution: mark-and-sweep GC and fork don't always play well together,
> in terms of sharing memory pages. The mark algorithm needs to touch all
> live objects in the heap. The child inherits the parent's heap, with
> copy on write. If the parent has a large heap, and the child does a GC,
> all those pages are copied into the child's address space. Memory usage
> will scale badly as the number of child processes grows. (Perhaps you
> factor your process into one child for each of the hundreds of files?)
>
> It can be a good idea to GC.disable in the child, in some cases:
>
> - parent has large heap, and
>
> - child lifespan and allocation rate are such that is does not need to GC
>
> Some benchmarks:
>
> http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...

I looked for some extra information on this topic and found:
http://blog.beaver.net/2005/03/ruby_gc_and_copyon...

That's pretty disheartening news to me. I had plans to make a fcgi-like
process manager that would take advantage of copy-on-write to reduce the
memory footprint of a webapp by pre-loading all libraries in the parent
process. But if ruby's GC renders COW useless... there's not much point
anymore.

Are there any plans to optimize ruby to make it fork-friendly?

Daniel

Gary Wright

3/13/2007 3:34:00 PM

0


On Mar 11, 2007, at 3:26 PM, Joel VanderWerf wrote:

> ara.t.howard@noaa.gov wrote:
>> if you can fork - that's the best - then you just let each child's
>> death clean
>> up that sub-segment of work's memory.
>
> One caution: mark-and-sweep GC and fork don't always play well
> together, in terms of sharing memory pages. The mark algorithm
> needs to touch all live objects in the heap. The child inherits the
> parent's heap, with copy on write.

I think you are describing a different situation than the OP and Ara.

If you've got hundreds of files to process and the processing is
sufficiently
complex to justify forking for each file then the parent just
iterates over
the file list forking and waiting for each child to process each
file. The
parent's address space won't have all the stale objects generated by
the child's
processing so each new child starts with a reasonable memory footprint.

One fork per file is the easiest to program but if that is
problematic for
some reason you could batch things up pretty easily.


Gary Wright