[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Why don't Ruby libraries share memory?

Matt Harvey

8/13/2007 6:25:00 PM

This paragraph is motivation. While my question is not Rails-specific, I am
asking it because of Rails. I've been investigating the memory footprint of
my Mongrels. It is nice that they share the .so libraries from ImageMagick
as well as other C libraries. However, each one still has about 20MB in
[heap]. My theory is that a lot of this is coming from ActiveRecord and
friends getting loaded again and again for each Mongrel, which seems to me
entirely unnecessary. My "marginal cost of Apache" is 1376kB. My "marginal
cost of Mongrel" is 27528kB, with the code I wrote. It seems that the latter
could be reduced a lot by sharing some Ruby libraries.

The question is as follows: if I require 'library' in one instance of Ruby
and then require 'library' again in another instance of Ruby, then do I get
duplicate copies of library's code in two chunks of my RAM? (I'm thinking I
do.) Why?


For further details and perhaps clarification, consider the following
script:

require 'smaps_parser'

smaps = SmapsParser.new(Process.pid)
puts smaps.sums.inspect

%w{rubygems active_record action_controller action_view RMagick}.each do |l|
puts "\nRequiring #{l}."
require l
smaps.refresh
puts smaps.sums.inspect
end


Though my Mongrel processes have already (each?) loaded copies of each l,
and though there is nothing "private" about the code in each l, I get the
following output, in which one should pay particular attention to the
increase of [:private_dirty]:

{:rss=>1520, :shared_clean=>964, :shared_dirty=>0, :private_clean=>12,
:size=>2968, :private_dirty=>544}

Requiring rubygems.
{:rss=>5032, :shared_clean=>1676, :shared_dirty=>0, :private_clean=>224,
:size=>7476, :private_dirty=>3132}

Requiring active_record.
{:rss=>12920, :shared_clean=>1816, :shared_dirty=>0, :private_clean=>224,
:size=>15452, :private_dirty=>10880}

Requiring action_controller.
{:rss=>18680, :shared_clean=>1828, :shared_dirty=>0, :private_clean=>228,
:size=>21152, :private_dirty=>16624}

Requiring action_view.
{:rss=>21088, :shared_clean=>1828, :shared_dirty=>0, :private_clean=>228,
:size=>23524, :private_dirty=>19032}

Requiring RMagick.
{:rss=>22512, :shared_clean=>2660, :shared_dirty=>0, :private_clean=>228,
:size=>29792, :private_dirty=>19624}


10 Answers

Jano Svitok

8/13/2007 6:57:00 PM

0

On 8/13/07, Matt Harvey <matt@teamdawg.org> wrote:
> This paragraph is motivation. While my question is not Rails-specific, I am
> asking it because of Rails. I've been investigating the memory footprint of
> my Mongrels. It is nice that they share the .so libraries from ImageMagick
> as well as other C libraries. However, each one still has about 20MB in
> [heap]. My theory is that a lot of this is coming from ActiveRecord and
> friends getting loaded again and again for each Mongrel, which seems to me
> entirely unnecessary. My "marginal cost of Apache" is 1376kB. My "marginal
> cost of Mongrel" is 27528kB, with the code I wrote. It seems that the latter
> could be reduced a lot by sharing some Ruby libraries.
>
> The question is as follows: if I require 'library' in one instance of Ruby
> and then require 'library' again in another instance of Ruby, then do I get
> duplicate copies of library's code in two chunks of my RAM? (I'm thinking I
> do.) Why?

I suppose the main problem is that Rails (or ActiveRecord, I don't
know exactly) is not thread-safe. That means you cannot share most of
its the data. That is the reason why you have to run more mongrels
compared to a one multi-threaded mongrel.

I don't know where exactly is the problem with rails/ar though, nor
whether is it at least theoretically solvable.

Jano Svitok

8/13/2007 6:59:00 PM

0

On 8/13/07, Jano Svitok <jan.svitok@gmail.com> wrote:
> On 8/13/07, Matt Harvey <matt@teamdawg.org> wrote:
> > This paragraph is motivation. While my question is not Rails-specific, I am
> > asking it because of Rails. I've been investigating the memory footprint of
> > my Mongrels. It is nice that they share the .so libraries from ImageMagick
> > as well as other C libraries. However, each one still has about 20MB in
> > [heap]. My theory is that a lot of this is coming from ActiveRecord and
> > friends getting loaded again and again for each Mongrel, which seems to me
> > entirely unnecessary. My "marginal cost of Apache" is 1376kB. My "marginal
> > cost of Mongrel" is 27528kB, with the code I wrote. It seems that the latter
> > could be reduced a lot by sharing some Ruby libraries.
> >
> > The question is as follows: if I require 'library' in one instance of Ruby
> > and then require 'library' again in another instance of Ruby, then do I get
> > duplicate copies of library's code in two chunks of my RAM? (I'm thinking I
> > do.) Why?
>
> I suppose the main problem is that Rails (or ActiveRecord, I don't
> know exactly) is not thread-safe. That means you cannot share most of
> its the data. That is the reason why you have to run more mongrels
> compared to a one multi-threaded mongrel.
>
> I don't know where exactly is the problem with rails/ar though, nor
> whether is it at least theoretically solvable.

And one more note: you can save a bit of memory, if you put the thread
safe code into one drb server, although most probably it's not worth
the effort.

Gregory Brown

8/13/2007 7:25:00 PM

0

On 8/13/07, Jano Svitok <jan.svitok@gmail.com> wrote:
> On 8/13/07, Matt Harvey <matt@teamdawg.org> wrote:
> > This paragraph is motivation. While my question is not Rails-specific, I am
> > asking it because of Rails. I've been investigating the memory footprint of
> > my Mongrels. It is nice that they share the .so libraries from ImageMagick
> > as well as other C libraries. However, each one still has about 20MB in
> > [heap]. My theory is that a lot of this is coming from ActiveRecord and
> > friends getting loaded again and again for each Mongrel, which seems to me
> > entirely unnecessary. My "marginal cost of Apache" is 1376kB. My "marginal
> > cost of Mongrel" is 27528kB, with the code I wrote. It seems that the latter
> > could be reduced a lot by sharing some Ruby libraries.
> >
> > The question is as follows: if I require 'library' in one instance of Ruby
> > and then require 'library' again in another instance of Ruby, then do I get
> > duplicate copies of library's code in two chunks of my RAM? (I'm thinking I
> > do.) Why?
>
> I suppose the main problem is that Rails (or ActiveRecord, I don't
> know exactly) is not thread-safe. That means you cannot share most of
> its the data. That is the reason why you have to run more mongrels
> compared to a one multi-threaded mongrel.

It's actually what Wayne mentioned. Since all Ruby classes can be
modified at runtime, it would be very scary to share them across
separate process instances unless you explicitly wanted that behavior.

As a naive example, consider this:

>> require "set"
=> true

>> class Set
>> def icanhasset
>> puts "Oh hai, I is an instance method"
>> end
>> end

>> Set[].icanhasset
Oh hai, I is an instance method

Imagine this shared across separate processes running different types
of code. Any modifications would be shared, and that means that you
couldn't meaningfully modify any classes without expecting problems or
weird bugs. Takes away half of the fun (and utility) of Ruby right
there. :)

-greg

Matt Harvey

8/13/2007 8:20:00 PM

0

Thanks for your reply; but I am still wondering about a more general
question. It is my understanding that require 'file.rb' will execute the
code in file.rb, so if I require the same file in two different ruby
processes, then I have duplicates of its classes in memory. If I do require
'c_library.so' then c_library.so will be loaded as a shared library. I
understand that one process might want to override some methods in file.rb,
while the other one might depend on the original versions being intact. This
is a good reason to load the library twice.

In some instances, though, you might know that there are not going to be
overrides, perhaps child classes at most. In this case the classes (no
objects) could be shared. Is there a way to do "shared Ruby libraries" and
have them act like shared C libraries? (This question reveals my ignorance
of how shared C libraries and OS kernels interact, but I suspect that C (or
any compiled?) libraries are special.) DRb is not really what I am asking
about; I mean to share only classes, not objects.

I have plenty of RAM to run my three Mongrel processes, which are already
overkill for serving a whopping 30 vists and 900 hits per day. (Shameful
plug of http://www.te... if you want to help me out with some more
load.) Therefore, I am not really trying to do anything, just theorizing.

I have seen widespread criticism of Rails as an poorly-scalable memory hog,
to which there are replies of, "Optimize your code," (often due to
ActiveRecord::Base.find generating lots of SELECT * queries) or "Buy more
RAM and servers until you bring down your database," (which will happen
pretty quickly with egregious SELECT *), to "Check your logs; your database
is already the problem." I think Rails is great and Ruby is even greater; in
fact I want to see them take over the world. It could happen a lot faster if
we could address criticisms like the above, and when a library is as large
as ActiveRecord, loading even one time too many is already cause for
criticism.

Sorry, I started talking about Rails again. The question is not about Rails.
The questions are: Is there any way we can have shared Ruby libraries
without turning the relevant code into a C extension? Is it necessary that
code be compiled to be put into shared memory by the OS? (Feel free to tell
me I'm being really stupid.) For instance, for all the GTK applications you
run your system needs to load GTK only once. It would be really nice if this
could be true of Ruby libraries. I have a feeling that this may just be a
limitation of interpreted languages. Please explain.


"Jano Svitok" <jan.svitok@gmail.com> wrote in message
news:8d9b3d920708131157o49bcaba5ibd19f4d1ae2a52ba@mail.gmail.com...
> I suppose the main problem is that Rails (or ActiveRecord, I don't
> know exactly) is not thread-safe. That means you cannot share most of
> its the data. That is the reason why you have to run more mongrels
> compared to a one multi-threaded mongrel.
>
> I don't know where exactly is the problem with rails/ar though, nor
> whether is it at least theoretically solvable.
>

Simon Krahnke

8/13/2007 11:56:00 PM

0

* Matt Harvey <matt@teamdawg.org> (22:19) schrieb:

> Sorry, I started talking about Rails again. The question is not about
> Rails. The questions are: Is there any way we can have shared Ruby
> libraries without turning the relevant code into a C extension? Is it
> necessary that code be compiled to be put into shared memory by the
> OS?

C-libraries have 2 nice features: They are read only, and ready to use
in a disk file. So a modern OS can just map that file into memory and
use the same physical memory for every process accessing the file.

Ruby classes aren't read only. But you could just write the changes into
memory and keep the constant part in a file. So the real problem is that
the source aren't ready to use for a modern interpreter. The data
structures the compiler works on are nowhere on disk, they are
dynamically created when the source files are evaluated.

The source files don't hog up memory, they can be freed after being
parsed or just memory mapped. It's the parsers result, the structures
the interpreter works on, that need the memory.

Ruby would have to use "precompiled" source files to be able to use
memory mapping. It could use copy-on-write to dynamically change the
code. There still much work on Ruby, so there may be a chance for that.

mfg, simon .... l

Daniel DeLorme

8/14/2007 3:15:00 AM

0

Matt Harvey wrote:
> Sorry, I started talking about Rails again. The question is not about
> Rails. The questions are: Is there any way we can have shared Ruby
> libraries without turning the relevant code into a C extension? Is it
> necessary that code be compiled to be put into shared memory by the OS?

The problem goes further than that. Even if you were to load your libs
in one process and then fork off worker processes (using copy-on-write
to share loaded code), the gargabe collector writes to *every* page in
memory when doing a garbage collecting run, thus negating the benefits
of COW. It's fixed in 1.9, thankfully, but 1.8 is going to be a memory
hog no matter which way you look at it.

Daniel

Eric Hodel

8/14/2007 3:30:00 AM

0

On Aug 13, 2007, at 11:25, Matt Harvey wrote:
> This paragraph is motivation. While my question is not Rails-
> specific, I am asking it because of Rails. I've been investigating
> the memory footprint of my Mongrels. It is nice that they share
> the .so libraries from ImageMagick as well as other C libraries.
> However, each one still has about 20MB in [heap]. My theory is that
> a lot of this is coming from ActiveRecord and friends getting
> loaded again and again for each Mongrel, which seems to me entirely
> unnecessary. My "marginal cost of Apache" is 1376kB. My "marginal
> cost of Mongrel" is 27528kB, with the code I wrote. It seems that
> the latter could be reduced a lot by sharing some Ruby libraries.
>
> The question is as follows: if I require 'library' in one instance
> of Ruby and then require 'library' again in another instance of
> Ruby, then do I get duplicate copies of library's code in two
> chunks of my RAM? (I'm thinking I do.) Why?

You'll get closer to the behavior you expect if you use Kernel#fork
to spawn new instances rather than starting up from the shell.

This is how Apache costs only 1376kB.

--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars



Joel VanderWerf

8/14/2007 6:57:00 AM

0

Daniel DeLorme wrote:

> The problem goes further than that. Even if you were to load your libs
> in one process and then fork off worker processes (using copy-on-write
> to share loaded code), the gargabe collector writes to *every* page in
> memory when doing a garbage collecting run, thus negating the benefits
> of COW. It's fixed in 1.9, thankfully, but 1.8 is going to be a memory
> hog no matter which way you look at it.

What does 1.9 do differently?

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Eric Hodel

8/14/2007 9:28:00 AM

0

On Aug 13, 2007, at 20:14, Daniel DeLorme wrote:
> Matt Harvey wrote:
>> Sorry, I started talking about Rails again. The question is not
>> about Rails. The questions are: Is there any way we can have
>> shared Ruby libraries without turning the relevant code into a C
>> extension? Is it necessary that code be compiled to be put into
>> shared memory by the OS?
>
> The problem goes further than that. Even if you were to load your
> libs in one process and then fork off worker processes (using copy-
> on-write to share loaded code), the gargabe collector writes to
> *every* page in memory when doing a garbage collecting run, thus
> negating the benefits of COW. It's fixed in 1.9, thankfully, but
> 1.8 is going to be a memory hog no matter which way you look at it.

For .so files, no, for .rb files, yes.

--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars



khaines

8/14/2007 2:40:00 PM

0