Asp Forum - [ANN] Mongrel 0.3.13.4 Pre-Release -- Ruby's LEAK Fixed (Death To Mutex!

Zed A. Shaw

8/26/2006 3:31:00 AM

Howdy Folks,

This release is after painstaking analysis of a memory leak that was
reported by Bradley Taylor, reduced by myself, and then fixed after much
work. You should all thank Bradley for finding the bizarre fix.

It turns out the Ruby has a memory leak when you use pretty much any
thread locking primitive other than Sync (Mutex, Monitor, etc.):

http://pastie.cabo...

The fix (for whatever reason) is to use Sync and put it in a block:

http://pastie.cabo...

Those two scripts are mini versions of how Mongrel manages threads so
that I could figure out a solution or get some input. The graph is
reported ram usage samples 1/second. As you can see the first leaking
graph goes up and doesn't go down, the second (fixed) graph cycles
properly.

** This is a Ruby issue, so if you have software using Mutex or Monitor,
change to Sync now. **

Tests of this latest pre-release show that the RAM is properly cycled by
the GC and that it's actually finally solved. If you run your app using
this release and you still have a leak then use the memory debugging
tools mongrel has to rule out your code (see below).

CHANGES

* No more allow_concurrency. Until Ruby's fixed I can't let people do
this anymore.
* USR1 debugging. If you're wondering about how Mongrel's locking of
Rails impacts your application, or what is causing BAD CLIENT then just
hit your mongrel_rails with USR1 and Mongrel will tell you.
* More extensive and accurate memory debugging. Use -B and look at the
log/mongrel_log/objects.log to get a good idea of counts of objects,
delta changes in counts, and mean+standard deviation lengths of objects
with length methods.
* Fixes a few places where sockets are closed and left in CLOSE_WAIT.

INSTALLING

As per usual:

sudo gem install mongrel --source=http://mongrel.rubyforge.org...

Initial tests show it works on 1.8.5 and is actually faster, but this is
unsupported for now.

TESTING THIS RELEASE

If you want to test the memory leak, here's the process:

1) Start your application in *production* mode:
mongrel_rails start -e production

2) Hit it with USR1:
killall -USR1 mongrel_rails

3) Start running something that prints out the ram (here's my fish
code):
while sleep 1
ps aux | grep mongrel_rails | grep -v grep | grep -v gvim | ruby -aln
-e "puts split[4 .. 5].join(',')"
end

4) Thrash a simple rails controller with httperf:
httperf --server 127.0.0.1 --port 3000 --num-conns 1000 --rate 120
--uri /testuri

What you want to do is adjust num-conns and rate until Mongrel reports
"X threads waiting for /testuri..."

The bug only manifests itself when threads pile up behind the guard
around Rails dispatching. This is also how you'd find out which Rails
actions are too slow.

Please report any bugs you find in this release, and a Win32 release
will come out after I'm sure it works for everyone else.

--
Zed A. Shaw
http://www.ze...
http://mongrel.ruby...
http://www.lingr.com/room/3... -- Come get help.

23 Answers

Ara.T.Howard

8/26/2006 9:12:00 AM

Zed A. Shaw

8/26/2006 12:46:00 PM

On Sat, 2006-08-26 at 18:12 +0900, ara.t.howard@noaa.gov wrote:
> On Sat, 26 Aug 2006, Zed Shaw wrote:
>
> > Howdy Folks,
> >
> > This release is after painstaking analysis of a memory leak that was
> > reported by Bradley Taylor, reduced by myself, and then fixed after much
> > work. You should all thank Bradley for finding the bizarre fix.

>
> if you are really serious about fixing your leak i suggest you re-work your
> tests. as i mentioned before they have several race conditions, not least of
> which that they both start a random number of threads, not 1000 as the code
> suggests (you can easily confirm by printing out the number of times the
> thread init loop executes). further, sync.rb is the single ruby lib i've had
> memory issues with on production systems. i have never managed to figure out
> why that is...

Ara, this is uncool. Seriously man, don't e-mail me personally and then
e-mail the list the exact same e-mail. Pick one and fight your battle
there.

As I mentioned to you before, the evidence shows you are wrong. Sure
you've cooked up a script that has a memory leak with Sync, but that
script doesn't operate the way Mongrel does. The sample I developed
does operate the way Mongrel does. It's called a "bug reduction". I'm
not going to test the leak in Mongrel with a reduction that doesn't
simulate Mongrel.

Also, as I said before, this shows the leak:

http://pastie.cabo...

And this script, with just 3 lines changed to use Sync shows it's fixed:

http://pastie.cabo...

With graphs even Ara! Graphs! We ran these tests for 30-40 minutes with
hundreds of thousands of threads being cycled during the tests.

Not to mention about 8 other people switching to Sync report their leaks fixed,
our own test script showing it's fixed, Mongrel tests showing it's fixed,
several other frameworks showing it, and you can't argue with the evidence.

If your script has a leak then fine, just don't do that. Ultimately though the
ruby lang guys need to fix this because either way, there's leaks. For now,
Mongrel is not leaking and I'm happy with that.

Now, I'd appreciate it if you'd maybe spend your energy looking into the ruby
source to find this leak rather than bothering me about it.

Thank you.

--
Zed A. Shaw
http://www.ze...
http://mongrel.ruby...
http://www.lingr.com/room/3... -- Come get help.

William Crawford

8/26/2006 1:14:00 PM

Zed Shaw wrote:
> On Sat, 2006-08-26 at 18:12 +0900, ara.t.howard@noaa.gov wrote:
>> On Sat, 26 Aug 2006, Zed Shaw wrote:
>>
>> > Howdy Folks,
>> >
>> > This release is after painstaking analysis of a memory leak that was
>> > reported by Bradley Taylor, reduced by myself, and then fixed after much
>> > work. You should all thank Bradley for finding the bizarre fix.
>
>>
>> if you are really serious about fixing your leak i suggest you re-work your
>> tests. as i mentioned before they have several race conditions, not least of
>> which that they both start a random number of threads, not 1000 as the code
>> suggests (you can easily confirm by printing out the number of times the
>> thread init loop executes). further, sync.rb is the single ruby lib i've had
>> memory issues with on production systems. i have never managed to figure out
>> why that is...
>
> Ara, this is uncool. Seriously man, don't e-mail me personally and then
> e-mail the list the exact same e-mail. Pick one and fight your battle
> there.

Maybe he simply cc'd you on the reply to the list? I doubt it was a
personal attack. At worst, probably a mistake. He's trying to be
helpful.

> As I mentioned to you before, the evidence shows you are wrong. Sure
> you've cooked up a script that has a memory leak with Sync, but that
> script doesn't operate the way Mongrel does. The sample I developed
> does operate the way Mongrel does. It's called a "bug reduction". I'm
> not going to test the leak in Mongrel with a reduction that doesn't
> simulate Mongrel.

I think the point here is that it isn't necessarily Sync/Mutex/Ruby, but
the way you use it. He managed to show Mutex leaking, you managed to
show Sync leaking. And you eached managed to show neither of them
leaking.

> With graphs even Ara! Graphs! We ran these tests for 30-40 minutes
> with
> hundreds of thousands of threads being cycled during the tests.

Graphs don't make it true.

> Not to mention about 8 other people switching to Sync report their leaks
> fixed,
> our own test script showing it's fixed, Mongrel tests showing it's
> fixed,
> several other frameworks showing it, and you can't argue with the
> evidence.

This has only been tested for a day. There's still huge possibilities
that what you're seeing isn't what you think you're seeing. We've all
made that mistake more than once.

> If your script has a leak then fine, just don't do that. Ultimately
> though the
> ruby lang guys need to fix this because either way, there's leaks. For
> now,
> Mongrel is not leaking and I'm happy with that.
>
> Now, I'd appreciate it if you'd maybe spend your energy looking into the
> ruby
> source to find this leak rather than bothering me about it.
>
> Thank you.

He's doing the same thing you are. Trying to expose the leak for what
it is. You can't go telling him to 'fix ruby' rather than bother the
list when you've done the same thing he has, just a day earlier. He's
submitting the information he has collected so others can try to figure
this out as well.

None of this is a personal attack. Nobody has said you didn't fix a
leak. There simply may be more going on here than anybody has seen yet.

--
Posted via http://www.ruby-....

Bob Hutchison

8/26/2006 2:22:00 PM

On Aug 26, 2006, at 5:12 AM, ara.t.howard@noaa.gov wrote:

> in any case, i'd carefully examine your tests (or the rails code if
> that is
> indeed what it's modeled after) to make sure that they test
> Mutex/Sync/Thread/Ruby and not your os virtual memory system and
> look closely
> at the results again - like i said, i have had issues with sync.rb.
>
> the point here is that it is probably the code in question and not
> Mutex per
> se that was causing your process to grow in vmsize.
>

I ran your test on OS/X looking at VSZ and RSS. And, like you,
initially got Sync with no leak visible, and mutex with what looks
like a bad leak. However, I notice that you only called GC once. I
have a years old habit of always running GC at least three times when
I really wanted GC to run (and in Java I had a loop that ran GC until
it stopped freeing stuff which in some cases was eight or nine
times). Superstition? Apparently not. On OS X, when I run GC three
times neither sync nor mutex show a memory leak.

Zed, just for fun, try running GC a few times in a row (like
GC.start; GC.start; GC.start) .

Cheers,
Bob

----
Bob Hutchison -- blogs at <http://www.rec...
hutch/>
Recursive Design Inc. -- <http://www.rec...>
Raconteur -- <http://www.raconteur...
xampl for Ruby -- <http://rubyforge.org/projects/...

Bob Hutchison

8/26/2006 3:20:00 PM

On Aug 26, 2006, at 10:22 AM, Bob Hutchison wrote:

>
> On Aug 26, 2006, at 5:12 AM, ara.t.howard@noaa.gov wrote:
>
>> in any case, i'd carefully examine your tests (or the rails code
>> if that is
>> indeed what it's modeled after) to make sure that they test
>> Mutex/Sync/Thread/Ruby and not your os virtual memory system and
>> look closely
>> at the results again - like i said, i have had issues with sync.rb.
>>
>> the point here is that it is probably the code in question and not
>> Mutex per
>> se that was causing your process to grow in vmsize.
>>
>
>
> I ran your test on OS/X looking at VSZ and RSS. And, like you,
> initially got Sync with no leak visible, and mutex with what looks
> like a bad leak. However, I notice that you only called GC once. I
> have a years old habit of always running GC at least three times
> when I really wanted GC to run (and in Java I had a loop that ran
> GC until it stopped freeing stuff which in some cases was eight or
> nine times). Superstition? Apparently not. On OS X, when I run GC
> three times neither sync nor mutex show a memory leak.
>
> Zed, just for fun, try running GC a few times in a row (like
> GC.start; GC.start; GC.start)

Well I tried your test on OS X. The Sync had no problem, the mutex
showed the memory growth (though it eventually (fifth iteration I
think) cleaned itself up). I modified your test to create exactly
1000 threads and call GC three times at the end, things were better,
i.e. it released its memory more quickly than without, but still not
good. I ended up with:

GC.start
`sync; sync; sync`
sleep 1
GC.start
`sync; sync; sync`
sleep 1
GC.start
`sync; sync; sync`
sleep 1
GC.start
`sync; sync; sync`
sleep 1

and this made a bigger difference. The memory usage was much more
tightly bound.

(And yes, the three calls to sync are also on purpose... in the late
70s through the 80s, calling sync once didn't guarantee anything, you
had to call it a few times, three generally worked... I don't know
the current situation because it is easy enough to type
sync;sync;sync (well, in truth, I usually alias sync to the three
calls))

But of course, the point is that despite appearances there is likely
no memory leak at all on OS X, just some kind of long term cycle of
process resource utilisation -- this is a complex situation, Ruby GC,
process resource utilisation/optimisation, and system optimisation
all interacting. Who knows what's actually going on.

So.

Cheers,
Bob

----
Bob Hutchison -- blogs at <http://www.rec...
hutch/>
Recursive Design Inc. -- <http://www.rec...>
Raconteur -- <http://www.raconteur...
xampl for Ruby -- <http://rubyforge.org/projects/...

Ara.T.Howard

8/26/2006 3:51:00 PM

M. Edward (Ed) Borasky

8/26/2006 4:19:00 PM

[snip snip snip]

And nobody has yet answered my questions about the platform. The fact
that two different people see different behavior of the Ruby interpreter
in different environments makes me wonder if there aren't some
underlying race conditions or similar platform gotchas at work here, in
addition to a Ruby problem and a Mongrel workaround for a Ruby problem.

So ... Zed ... how many processors do you have, how much RAM and what OS
are you running?

Ara ... how many processors do you have, how much RAM and what OS are
you running?

khaines

8/27/2006 2:30:00 AM

Ara.T.Howard

8/27/2006 2:35:00 AM

M. Edward (Ed) Borasky

8/27/2006 3:04:00 AM

Bob Hutchison wrote:

[snip]
> Well I tried your test on OS X. The Sync had no problem, the mutex
> showed the memory growth (though it eventually (fifth iteration I think)
> cleaned itself up). I modified your test to create exactly 1000 threads
> and call GC three times at the end, things were better, i.e. it released
> its memory more quickly than without, but still not good. I ended up with:
>
> GC.start
> `sync; sync; sync`
> sleep 1
> GC.start
> `sync; sync; sync`
> sleep 1
> GC.start
> `sync; sync; sync`
> sleep 1
> GC.start
> `sync; sync; sync`
> sleep 1
>
> and this made a bigger difference. The memory usage was much more
> tightly bound.
>
> (And yes, the three calls to sync are also on purpose... in the late 70s
> through the 80s, calling sync once didn't guarantee anything, you had to
> call it a few times, three generally worked... I don't know the current
> situation because it is easy enough to type sync;sync;sync (well, in
> truth, I usually alias sync to the three calls))
>
> But of course, the point is that despite appearances there is likely no
> memory leak at all on OS X, just some kind of long term cycle of process
> resource utilisation -- this is a complex situation, Ruby GC, process
> resource utilisation/optimisation, and system optimisation all
> interacting. Who knows what's actually going on.
Finally someone with some platform details!! <vbg>

OK ... here's my take

1. The OS "three-sync" thing is, as you pointed out, a throwback to days
when you needed to do that sort of thing. It's superstition.

2. IIRC OS X is a BSD-type kernel rather than a Linux kernel, so at
least we know a different memory manager still needs "help" to deal with
this kind of application.

3. Typing a single "sync" into a *Linux* kernel when there's a lot of
"stuff" built up in RAM is a very bad idea. It will force the system
into an I/O bound mode and lock everybody out until the kernel has
cleaned up after itself. Either OS X has a better memory and I/O manager
than Linux, or you didn't have a lot of "stuff" built up from this
simple test. The second and third syncs are both unnecessary and
harmless. :)

4. Deleting references to no-longer-needed objects and then explicitly
calling the garbage collector has a longer history and tradition than
UNIX. It is "standard software engineering practice" in any environment
that has garbage collection. Just last week, I had to stick such a call
into an R program to keep it from crashing (on a Windows machine with 2
GB of RAM!)

For the software engineering philosophers on the list, what's the
difference between a language that forces the engineer to explicitly
manage dynamic memory allocation and de-allocation, and one that
supposedly relieves the engineer from that need -- until you crash in a
production system that worked on smaller test cases a couple of months
ago? :)

5. Can somebody run these Ruby leak tests/demos on a Windows XP or 2003
Server with multiple processors? I'm really curious what happens.

comp.lang.ruby

[ANN] Mongrel 0.3.13.4 Pre-Release -- Ruby's LEAK Fixed (Death To Mutex!

Zed A. Shaw

Ara.T.Howard

Zed A. Shaw

William Crawford

Bob Hutchison

Bob Hutchison

Ara.T.Howard

M. Edward (Ed) Borasky

khaines

Ara.T.Howard

M. Edward (Ed) Borasky

x Login to ForumsZone