Asp Forum - Can Anyone Explain This Memory Leak?

Zed A. Shaw

8/25/2006 4:59:00 AM

Hi Folks,

Sorry to get your attention. :-)

There's a very strange problem with Mongrel where if Threads are created
because of the Mutex around Rails dispatching, then lots of ram gets
created that never seems to go away.

I boiled the problem down to this:

http://pastie.cabo...

It's a graph of the "leak" and the base code that causes it (nothing
Mongrel in it at all). This code kind of simulates how Mongrel is
managing threads and locking Rails.

What this code does is create threads until there's 1000 in a
ThreadGroup waiting on a Mutex. Inside the guard 30000 integers are put
inside an Array. Don't let this distract you since it can be strings,
or even nothing and you'll see the same thing. It's just to simulate
Rails creating all the stuff it creates, and to demonstrate that while
these objects should go away, they do not.

Then it waits in 10 second increments for these threads to go away,
calling GC.start each time.

And what happens is the graph you see (samples of mem usage of the ruby
process 1/second after 3 cycles of create/destroy threads). Rather than
the memory for the threads and the array of integers going away, it
sticks around. It'll dip a little bit, but not much, just tops out
there and doesn't die. Even though all the threads are clearly gone and
none of their contents should be around.

In contrast, if you remove the Mutex then the ram behaves as you'd
expect, with it going up and then going away.

I'm hoping people way smarter with Ruby than myself can tell me why this
happens, what is wrong with this code, and how to fix it.

Thanks.

--
Zed A. Shaw
http://www.ze...
http://mongrel.ruby...
http://www.lingr.com/room/3... -- Come get help.

20 Answers

Ara.T.Howard

8/25/2006 5:33:00 AM

Kent Sibilev

8/25/2006 5:41:00 AM

Have you tried to run it with the latest 1.8.5 prerelease?
from changelog:

Thu Dec 29 23:59:37 2005 Nobuyoshi Nakada <nobu@ruby-lang.org>

* eval.c (rb_gc_mark_threads): leave unmarked threads which won't wake
up alone, and mark threads in the loading table. [ruby-dev:28154]

* eval.c (rb_gc_abort_threads), gc.c (gc_sweep): kill unmarked
threads. [ruby-dev:28172]

On 8/25/06, Zed Shaw <zedshaw@zedshaw.com> wrote:
> Hi Folks,
>
> Sorry to get your attention. :-)
>
> There's a very strange problem with Mongrel where if Threads are created
> because of the Mutex around Rails dispatching, then lots of ram gets
> created that never seems to go away.
>
> I boiled the problem down to this:
>
> http://pastie.cabo...
>
> It's a graph of the "leak" and the base code that causes it (nothing
> Mongrel in it at all). This code kind of simulates how Mongrel is
> managing threads and locking Rails.
>
> What this code does is create threads until there's 1000 in a
> ThreadGroup waiting on a Mutex. Inside the guard 30000 integers are put
> inside an Array. Don't let this distract you since it can be strings,
> or even nothing and you'll see the same thing. It's just to simulate
> Rails creating all the stuff it creates, and to demonstrate that while
> these objects should go away, they do not.
>
> Then it waits in 10 second increments for these threads to go away,
> calling GC.start each time.
>
> And what happens is the graph you see (samples of mem usage of the ruby
> process 1/second after 3 cycles of create/destroy threads). Rather than
> the memory for the threads and the array of integers going away, it
> sticks around. It'll dip a little bit, but not much, just tops out
> there and doesn't die. Even though all the threads are clearly gone and
> none of their contents should be around.
>
> In contrast, if you remove the Mutex then the ram behaves as you'd
> expect, with it going up and then going away.
>
> I'm hoping people way smarter with Ruby than myself can tell me why this
> happens, what is wrong with this code, and how to fix it.
>
> Thanks.
>
> --
> Zed A. Shaw
> http://www.ze...
> http://mongrel.ruby...
> http://www.lingr.com/room/3... -- Come get help.
>
>
>

--
Kent
---
http://www.dat...

Srinivas Jonnalagadda

8/25/2006 5:44:00 AM

On Fri, 2006-08-25 at 14:33 +0900, ara.t.howard@noaa.gov wrote:
> i think this is just illustraing reason 42 why i prefer Kernel.fork to
> Thread.new - the only real way to return memory to the os is to exit! ;-)

And why Apache with pre-fork is good for long-running applications :-)

Greetings,
JS

Joel VanderWerf

8/25/2006 5:46:00 AM

ara.t.howard@noaa.gov wrote:
> i don't think you have a leak. try running under electric fence (ef).
> when i
> do i clearly see the memory rise from 1->20% on my desktop, and then
> decline
> back down to 1%, over and over with no reported leaks. the cycle matches
> the logging of the script perfectly.
>
> here's the thing though, when i don't run it under electric fence i see the
> memory climb to about 20% and then stay there forever. but this too
> does not
> indicate a leak. it just shows how calling 'free' in a process doesn't
> really
> release memory to the os, only to the process itself. the reason you
> see the
> memory vary nicely under ef is that it replaces the standard malloc/free
> with
> it's own voodoo - details of which i do not understand or care too. the
> point, however, is that it's 'free' which is doing the 'leaking' - just
> at the
> os level, not the process (ruby) level. we have tons of really long
> running
> processes that exhibit the exact same behaviour - basically the memory
> image
> will climb to maximum and stay there. oddly, however, when you tally
> them all
> up the usage exceeds the system capacity plus swap by miles.

So, to test this hypothesis, the OP could try to instantiate a large
number of objects, and see if there is no effect on the vmsize reported
by the OS, right? Because those objects should be able to use the memory
that is owned by the process, but not used by ruby objects.

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Zed A. Shaw

8/25/2006 5:51:00 AM

On Fri, 2006-08-25 at 14:33 +0900, ara.t.howard@noaa.gov wrote:
> On Fri, 25 Aug 2006, Zed Shaw wrote:
>
> > Hi Folks,
> >
> > Sorry to get your attention. :-)
> >
> > There's a very strange problem with Mongrel where if Threads are created
> > because of the Mutex around Rails dispatching, then lots of ram gets
> > created that never seems to go away.
> >
> > I boiled the problem down to this:
> >
> > http://pastie.cabo...
> >
> hi zed-
>
> i don't think you have a leak. try running under electric fence (ef). when i
> do i clearly see the memory rise from 1->20% on my desktop, and then decline
> back down to 1%, over and over with no reported leaks. the cycle matches
> the logging of the script perfectly.
>
> here's the thing though, when i don't run it under electric fence i see the
> memory climb to about 20% and then stay there forever. but this too does not
> indicate a leak. it just shows how calling 'free' in a process doesn't really
> release memory to the os, only to the process itself. the reason you see the
> memory vary nicely under ef is that it replaces the standard malloc/free with
> it's own voodoo - details of which i do not understand or care too. the
> point, however, is that it's 'free' which is doing the 'leaking' - just at the
> os level, not the process (ruby) level. we have tons of really long running
> processes that exhibit the exact same behaviour - basically the memory image
> will climb to maximum and stay there. oddly, however, when you tally them all
> up the usage exceeds the system capacity plus swap by miles.
>

Nope, I can't agree with this because the ram goes up, the OS will kill
it eventually, and if I remove the guard the ram doesn't do this.

And where are you getting your information that free doesn't free
memory? I'd like to read that since all my years of C coding says that
is dead wrong. Care to tell me how malloc/free would report 80M with
Mutex but properly show the ram go down when there is no-Mutex?

And why Linux would kill processes if the ram get too high? Why whole
VPS servers crash? I mean if this ram was just "fake" reporting (which
is very hard to believe) then why are all these things happening?

So please, point me at where in the specifications for malloc/free on
Linux it says that the memory reported will be high even though free and
malloc is called on 80M of ram slowly cycled out, and that linux will
still kill your process even though this ram is not really owned by the
process.

--
Zed A. Shaw
http://www.ze...
http://mongrel.ruby...
http://www.lingr.com/room/3... -- Come get help.

Zed A. Shaw

8/25/2006 5:56:00 AM

On Fri, 2006-08-25 at 14:41 +0900, Kent Sibilev wrote:
> Have you tried to run it with the latest 1.8.5 prerelease?
> from changelog:
>
> Thu Dec 29 23:59:37 2005 Nobuyoshi Nakada <nobu@ruby-lang.org>
>
> * eval.c (rb_gc_mark_threads): leave unmarked threads which won't wake
> up alone, and mark threads in the loading table. [ruby-dev:28154]
>
> * eval.c (rb_gc_abort_threads), gc.c (gc_sweep): kill unmarked
> threads. [ruby-dev:28172]
>

You gotta be kidding me. A damn bug? Oh no, according to ara.t.howard
it's because free doesn't actually free.

Man, two days wasted for nothing.

I'll try 1.8.5 tomorrow. I'm kind of tired of this to be honest.

--
Zed A. Shaw
http://www.ze...
http://mongrel.ruby...
http://www.lingr.com/room/3... -- Come get help.

Ara.T.Howard

8/25/2006 5:57:00 AM

Zed A. Shaw

8/25/2006 6:01:00 AM

On Fri, 2006-08-25 at 14:46 +0900, Joel VanderWerf wrote:

> So, to test this hypothesis, the OP could try to instantiate a large
> number of objects, and see if there is no effect on the vmsize reported
> by the OS, right? Because those objects should be able to use the memory
> that is owned by the process, but not used by ruby objects.
>

This is actually what I refer to in the no-Mutex situation. Create a
ton of threads without a mutex in them and the ram goes away. The
evidence doesn't support the claims at all.

Also the fact that the OS is killing these processes and swap is getting
used indicates that this is real memory being lost.

But, a reply from Kent Sibilev says this could be a bug in 1.8.4. So
there's even more evidence that it is a leak.

--
Zed A. Shaw
http://www.ze...
http://mongrel.ruby...
http://www.lingr.com/room/3... -- Come get help.

Ara.T.Howard

8/25/2006 6:59:00 AM

Ara.T.Howard

8/25/2006 7:10:00 AM

comp.lang.ruby

Can Anyone Explain This Memory Leak?

Zed A. Shaw

Ara.T.Howard

Kent Sibilev

Srinivas Jonnalagadda

Joel VanderWerf

Zed A. Shaw

Zed A. Shaw

Ara.T.Howard

Zed A. Shaw

Ara.T.Howard

Ara.T.Howard

x Login to ForumsZone