Asp Forum - enterprise ruby

Roger Pack

11/9/2007 8:28:00 PM

I am thinking of doing a 'side by side' distro of Ruby that includes the
latest SVN up's, as well as some 'fringe' best practices, like a tweaked
GC.
It would have the ability to do force_recycle on objects arbitrarily (at
your own risk), and getters and setters for the GC variables (like how
often to recycle, how close you are to the next collection, how big of
heap blocks to use, etc.)
and also have a GC that is write-on-copy friendly (takes barely longer,
but doesn't dirty memory).

And any other personal tweaks that people contribute. Kind of a
bleeding edge Ruby.

Would that be useful to anyone? Would anyone use it?
Thanks and take care.
-Roger
--
Posted via http://www.ruby-....

49 Answers

Robert Klemme

11/9/2007 9:05:00 PM

On 09.11.2007 21:28, Roger Pack wrote:
> I am thinking of doing a 'side by side' distro of Ruby that includes the
> latest SVN up's, as well as some 'fringe' best practices, like a tweaked
> GC.
> It would have the ability to do force_recycle on objects arbitrarily (at
> your own risk), and getters and setters for the GC variables (like how
> often to recycle, how close you are to the next collection, how big of
> heap blocks to use, etc.)
> and also have a GC that is write-on-copy friendly (takes barely longer,
> but doesn't dirty memory).
>
> And any other personal tweaks that people contribute. Kind of a
> bleeding edge Ruby.
>
> Would that be useful to anyone? Would anyone use it?
> Thanks and take care.

Personally, if I had the resources to invest into this I'd rather spend
them on JRuby. You get a GC with many tweaking options etc. plus native
threads.

Kind regards

robert

Lionel Bouton

11/10/2007 12:11:00 AM

Roger Pack wrote the following on 09.11.2007 21:28 :
> I am thinking of doing a 'side by side' distro of Ruby that includes the
> latest SVN up's, as well as some 'fringe' best practices, like a tweaked
> GC.
> It would have the ability to do force_recycle on objects arbitrarily (at
> your own risk), and getters and setters for the GC variables (like how
> often to recycle, how close you are to the next collection, how big of
> heap blocks to use, etc.)
> and also have a GC that is write-on-copy friendly (takes barely longer,
> but doesn't dirty memory).
>
> And any other personal tweaks that people contribute. Kind of a
> bleeding edge Ruby.
>
> Would that be useful to anyone? Would anyone use it?
>

Useful ? Yes it would : I have some Rails actions that eat memory quite
happily (associations with hundreds of thousands of objects which
themselves have associations on which to work...). It would help if the
ruby processes would let go of the memory or at least let it live in
swap undisturbed once the action is done.

Would I use it ? Probably, I'll have to get time to both review your
patch myself and stress-test it (I prefer to test it and understand it
first-hand because I suppose it won't be used by many). Would the patch
be easy to understand to someone both familiar with Ruby, GC techniques
and C? Or is preliminary knowledge of Ruby's internal a must?

Regards,

Lionel

Lionel Bouton

11/10/2007 12:35:00 AM

Robert Klemme wrote the following on 09.11.2007 22:05 :
>
> Personally, if I had the resources to invest into this I'd rather
> spend them on JRuby. You get a GC with many tweaking options etc.
> plus native threads.
>

Please don't forget that many gems still don't work or don't have
replacement in JRuby. JRuby is *the* solution for people needing easy
Ruby <-> Java integration but Ruby with strong Unix ties has its
benefits too.

I think I'd have to spend quite some time to migrate applications from
MRI to JRuby: I heavily used ruby-gettext, hpricot, memcache,
ruby-opengl and I believe most of these use C for library interfaces or
performance... some utils like rcov probably don't work either with
JRuby because they probably rely on the same C interface.

So as much as I'd like JRuby to succeed even if I don't use it myself
(currently), people willing to work on MRI (or YARV and Rubinious for
that matter) are most welcomed to do so too.

But maybe there is an efficient way to use JNI to trivially port most of
these to JRuby. This could motivate my toying with JRuby...

Regards,

Lionel

M. Edward (Ed) Borasky

11/10/2007 3:36:00 AM

Roger Pack wrote:
> I am thinking of doing a 'side by side' distro of Ruby that includes the
> latest SVN up's, as well as some 'fringe' best practices, like a tweaked
> GC.
> It would have the ability to do force_recycle on objects arbitrarily (at
> your own risk), and getters and setters for the GC variables (like how
> often to recycle, how close you are to the next collection, how big of
> heap blocks to use, etc.)
> and also have a GC that is write-on-copy friendly (takes barely longer,
> but doesn't dirty memory).
>
> And any other personal tweaks that people contribute. Kind of a
> bleeding edge Ruby.
>
> Would that be useful to anyone? Would anyone use it?
> Thanks and take care.
> -Roger

What would be more useful to me, and in fact where I'm headed, is a Ruby
that's tunable to your *hardware*. Just make a *source* distribution and
force people to recompile it. Right now, my tweaks are all at the GCC
level, and that's the way it's going to be for a while. I don't believe
I've exhausted all of the goodies that GCC has to offer, especially GCC 4.2.

Another thing that would be more useful is a comprehensive enough test
and benchmark suite that a user could tell what the payoffs were from
the tweaks and whether the language syntax and semantics remained intact
after the tweaks.

I'm in the process of re-factoring the Rakefile from my profiling
efforts. I'd be happy to profile your source as part of that. By the
way, are you starting with 1.9 or 1.8? I'm still profiling 1.8 only, but
I expect to have 1.9 profiling working within a week or so.

M. Edward (Ed) Borasky

11/10/2007 4:58:00 AM

Lionel Bouton wrote:
> Useful ? Yes it would : I have some Rails actions that eat memory quite
> happily (associations with hundreds of thousands of objects which
> themselves have associations on which to work...). It would help if the
> ruby processes would let go of the memory or at least let it live in
> swap undisturbed once the action is done.

Sounds to me like you're building a data structure in RAM to avoid
making your RDBMS earn its keep. ;) But seriously, unless you can
restructure your application so it doesn't keep a lot of stuff in RAM,
you're probably doomed to throw hardware at it. In other words, hard
drives are where data that must live for extended periods (or forever)
belong, *explicitly* filed there by your application code, not
*implicitly* filed there by the swapper. RAM is for volatile information
that is being used and re-used frequently.

Lionel Bouton

11/10/2007 10:26:00 AM

M. Edward (Ed) Borasky wrote the following on 10.11.2007 05:58 :
> Lionel Bouton wrote:
>> Useful ? Yes it would : I have some Rails actions that eat memory quite
>> happily (associations with hundreds of thousands of objects which
>> themselves have associations on which to work...). It would help if the
>> ruby processes would let go of the memory or at least let it live in
>> swap undisturbed once the action is done.
>
> Sounds to me like you're building a data structure in RAM to avoid
> making your RDBMS earn its keep. ;)

In some cases, yes because the code is easier to maintain that way. I
usually take the time to switch to SQL when it becomes a problem though
and pure SQL is powerful enough for the task (done that several times
last month).

But my current problem is that simply iterating other large associations
(to create a new object for each and every object on the other end of a
has_many association with complex business rules SQL can't handle for
example) is enough to use 100-300MB with hundreds of thousands of
objects. Usually I can split the task paginating through the whole set,
but in some cases it isn't possible : if inserts or deletes happens
concurrently you can miss some objects or try to process some twice (I'm
actually considering fetching all the primary keys in a first pass and
then paginate using windows in this set, which comes with other problems
though manageable ones in my case)...

A temporary 100-300MB spike isn't a problem, what is a problem is that :
1/ the memory isn't freed after completion of the task,
2/ it's kept dirty by the GC
->there's no way the OS can reuse this memory for another spike
happening in another process, only the original process can reuse it.

This is not a major problem : I can always move all these huge
processings in short-lived dedicated processes but it's kind of a downer
when the language keeps out of your way most of the time and then shows
one limitation.

> But seriously, unless you can restructure your application so it
> doesn't keep a lot of stuff in RAM, you're probably doomed to throw
> hardware at it.

Yes, I've done some simple code tuning that helps memory usage, but it
only helps with the wait for a bigger server.

> In other words, hard drives are where data that must live for extended
> periods (or forever) belong, *explicitly* filed there by your
> application code, not *implicitly* filed there by the swapper. RAM is
> for volatile information that is being used and re-used frequently.
>

As I understand it, the problem is that MRI keeps some unused memory
allocated and then the GC marks it dirty... So technically there's
information being used and re-used frequently but only by the GC :-(

Lionel.

M. Edward (Ed) Borasky

11/10/2007 8:47:00 PM

Lionel Bouton wrote:
> But my current problem is that simply iterating other large associations
> (to create a new object for each and every object on the other end of a
> has_many association with complex business rules SQL can't handle for
> example) is enough to use 100-300MB with hundreds of thousands of
> objects.

Ah ... complex business rules. That's the big problem with programming
languages -- they make it possible to *have* complex business rules.
Before computers were invented, we had to make do with "buy low, sell
high, collect early, pay late" and double-entry bookkeeping. :)

> A temporary 100-300MB spike isn't a problem, what is a problem is that :
> 1/ the memory isn't freed after completion of the task,
> 2/ it's kept dirty by the GC
> ->there's no way the OS can reuse this memory for another spike
> happening in another process, only the original process can reuse it.
>
> This is not a major problem : I can always move all these huge
> processings in short-lived dedicated processes but it's kind of a downer
> when the language keeps out of your way most of the time and then shows
> one limitation.
>

[snip]

> As I understand it, the problem is that MRI keeps some unused memory
> allocated and then the GC marks it dirty... So technically there's
> information being used and re-used frequently but only by the GC :-(

Well ... that sounds like an actual bug rather than a design issue in
MRI. Is it that the GC can't tell it's unused?

Roger Pack

11/10/2007 9:29:00 PM

>> As I understand it, the problem is that MRI keeps some unused memory
>> allocated and then the GC marks it dirty... So technically there's
>> information being used and re-used frequently but only by the GC :-(
>
>
> Well ... that sounds like an actual bug rather than a design issue in
> MRI. Is it that the GC can't tell it's unused?

The GC's mark and sweep 'recreates' its freelist every time it runs a
GC, so if you have a lot of free objects (believe it or not), it will
remark them all--possibly in about the same order as the previous. A
design thing.

So this interesting point of yours may have two implications: a ruby
proc that retains lots of 'free' memory will have a longer sweep time
(which having lotsa free is quite common with the standard MRI--it
allocates exponentially larger and larger heap sizes, so you're almost
guaranteed (with a large process) the the 'last most' heap will be half
used, and, as you noted, the entire thing constantly remarked for every
GC (all used marked as 'valid', all free remarked for the freelist).

The way to avoid this would be to 'only add' to the freelist as you
unallocate objects. Then you'd avoid marking the free objects. You
could still free heaps the same way. If you did that you'd still be
traversing them for every GC (to look through for allocated objects no
longer accessible--unmarked objects), but wouldn't be marking them
dirty.
Drawback might be a freelist that isn't 'optimized in order' or
something (probably not much of a drawback).

Another way to kind of combat this is to use a smaller 'heap chunk' size
(instead of Ruby's exponentially growing one), as this allows chunks
more frequently to be freed, which means they aren't traversed
(basically you don't have as much free memory kicking around, so you
don't traverse it as much). It still leaves all free memory to
traverse, however.

If you wanted to avoid ever accessing freed objects at all, you'd need
to create an 'allocated' list, as well, so you could just traverse the
allocated list and then add those to the freelist that were freed. So
about 20%/object size increase. Maybe a good trade off??? Tough to
tell. If I guessed I'd say that the trade off is...worth it for large
long standing processes. It would use more RAM and be faster.

Maybe an optimized GC might not be such a bad idea after all :)

-Roger
--
Posted via http://www.ruby-....

M. Edward (Ed) Borasky

11/10/2007 10:32:00 PM

Roger Pack wrote:

[snip]

> Maybe an optimized GC might not be such a bad idea after all :)

I haven't been following 1.9 closely enough to know what it does about
garbage collection. But yes, it does look like the MRI GC could stand
some optimization. Given the trade-offs and use cases, I'd optimize for
Rails. And I'm guessing that on Linux/GCC, a nice tight stop-and-copy GC
might well outperform what's there, and a generational GC would be
better than what's there but not worth the coding effort. I can't help
you on Windows or MacOS ... the memory management there is a black box
to me.

Which brings up an interesting question. While it seems more Ruby
developers work with Macs than with Windows or Linux, where are most of
the Ruby server applications (Rails and otherwise) deployed? I want to
guess Linux, but I don't actually know for a fact that is the case.

As a previous email suggested, there are a couple of use cases for a
garbage collector, only one of them being long-running server
applications. But if the overwhelming majority of Ruby server
applications are Rails on Linux, it would surely be worthwhile tuning
the GC to that. Stop-and-copy integrated with the Linux memory manager
(assume RHEL 5/CentOS 5 64-bit) sounds like a winner off the top of my head.

Hmmm ... maybe I should dual-boot my workstation with CentOS 5 and fool
around with this. ;)

Clifford Heath

11/10/2007 11:29:00 PM

Roger Pack wrote:
> If you wanted to avoid ever accessing freed objects at all, you'd need
> to create an 'allocated' list, as well, so you could just traverse the
> allocated list and then add those to the freelist that were freed. So
> about 20%/object size increase. Maybe a good trade off??? Tough to
> tell. If I guessed I'd say that the trade off is...worth it for large
> long standing processes. It would use more RAM and be faster.

Actually, although it might use more virtual address space, if done right,
it might consume *less* RAM - meaning physical, working set - simply by
not touching pages that aren't in use.

Knowledge and awareness of the VM state seems to often get neglected in
these discussions of GC, even though it's quite easy to compare the VM
effects of the various types.

Clifford Heath.

comp.lang.ruby

enterprise ruby

Roger Pack

Robert Klemme

Lionel Bouton

Lionel Bouton

M. Edward (Ed) Borasky

M. Edward (Ed) Borasky

Lionel Bouton

M. Edward (Ed) Borasky

Roger Pack

M. Edward (Ed) Borasky

Clifford Heath

x Login to ForumsZone