Asp Forum - building Ruby with dmalloc

Young Hyun

1/10/2007 6:55:00 PM

Has anyone managed to build Ruby with dmalloc support? I'm having
numerous problems trying to do so on MacOS X 10.4.8 and FreeBSD 6.1.
I'm trying to hunt down a memory leak in Ruby, probably in the Mutex
code. Also, fastthread 0.6.1 is crashing for me, possibly due to a
memory corruption, and I'd like to figure out what's going on there.

--Young

9 Answers

khaines

1/10/2007 7:04:00 PM

MenTaLguY

1/10/2007 9:39:00 PM

On Thu, 2007-01-11 at 03:54 +0900, Young Hyun wrote:
> Has anyone managed to build Ruby with dmalloc support? I'm having
> numerous problems trying to do so on MacOS X 10.4.8 and FreeBSD 6.1.
> I'm trying to hunt down a memory leak in Ruby, probably in the Mutex
> code. Also, fastthread 0.6.1 is crashing for me, possibly due to a
> memory corruption, and I'd like to figure out what's going on there.

It's not quite as shiny as dmalloc, but have you tried electric fence
(libefence)? On Linux, at least, you can use it via LD_PRELOAD, without
recompiling Ruby.

Also, do you see the memory corruption under both MacOS X and FreeBSD?

-mental

Young Hyun

1/11/2007 8:37:00 PM

On Jan 10, 2007, at 1:39 PM, MenTaLguY wrote:

> It's not quite as shiny as dmalloc, but have you tried electric fence
> (libefence)? On Linux, at least, you can use it via LD_PRELOAD,
> without
> recompiling Ruby.

I haven't tried electric fence, but thanks for mentioning it.

I've given up trying to build Ruby with dmalloc support now that I've
learned that MacOS X has built-in support for dmalloc-like memory
debugging. These debugging features are available automatically in
the standard malloc routines, and I don't have to do anything special
in the build process of Ruby. For the curious, more information is
available in Apple's Tech Note TN2124:

http://developer.apple.com/technotes/tn2004/tn2124.html...

> Also, do you see the memory corruption under both MacOS X and FreeBSD?

Yes, there seems to be memory corruption under both. Here's what I
get under FreeBSD 6.1-STABLE-200607. I have two processes
communicating over SSL that abort in two different ways. The first
dies with 'rb_gc_mark(): unknown data type', which causes the second
to crash at exit (after I hit ^C) with '[BUG] Segmentation
fault' (I've included the actual messages below). I don't get these
crashes if I run my program without fastthread (but my program can't
run for long periods of time without fastthread because of a serious
memory leak, so I can't say with 100% certainty that these crashes
don't happen without fastthread).

If I immediately interrupt (^C) the server process before the client
has connected to it, then I don't get '[BUG] Segmentation fault'.
Threads are already running at this point (and require 'fastthread'
has executed), but perhaps mutex operations haven't been done yet.
Now, once a client connects, and some communication takes place
(definitely causing mutexes/fastthread to be used), I get a segfault
if I interrupt the server process. Here's the transcript:

$ ~/ruby-1.8.5-p12/bin/ruby g.rb
Waiting for clients on port 8742...
^C./globalserver.rb:62:in `join': Interrupt
from ./globalserver.rb:62:in `join'
from g.rb:10
$ ~/ruby-1.8.5-p12/bin/ruby g.rb
Waiting for clients on port 8742...
accepted connection from 192.172.226.88
GlobalSpaceDemux: got hello from $Id: globalmux.rb,v 1.43 2006/12/13
20:51:49 youngh Exp $, protocol 1
Waiting for clients on port 8742...
^C./globalserver.rb:62:in `join': Interrupt
from ./globalserver.rb:62:in `join'
from g.rb:10
/globalserver.rb:62: [BUG] Segmentation fault
ruby 1.8.5 (2006-12-25) [i386-freebsd6.1]

Abort trap: 6 (core dumped)

---------------------------------
### building ruby:
export CFLAGS=-g # prevent building with -O2
/configure --prefix=/home/youngh/ruby-1.8.5-p12 --enable-pthread

### building fastthread-0.6.1 with '~/ruby-1.8.5-p12/bin/ruby setup.rb':
gcc -I. -I/home/youngh/ruby-1.8.5-p12/lib/ruby/1.8/i386-freebsd6.1 -I/
home/young
h/ruby-1.8.5-p12/lib/ruby/1.8/i386-freebsd6.1 -I/home/youngh/ruby/
fastthread-0.6
1/ext/fastthread -fPIC -g -c fastthread.c
gcc -shared -Wl,-soname,fastthread.so -L'/home/youngh/ruby-1.8.5-p12/
lib' -Wl,-R
'/home/youngh/ruby-1.8.5-p12/lib' -o fastthread.so fastthread.o -
lpthread -lcry
pt -lm -lc

### incidentally, I get the same/similar problems if I build ruby
without --enable-pthread

### for both crashes, the stack is corrupted--notice how rb_bug() is
at frame #97
### the stack doesn't get corrupted this badly or at all with MacOS X
(running on PowerPC)

$ gdb -c ruby.core
(gdb) file /home/youngh/ruby-1.8.5-p12/bin/ruby
Reading symbols from /home/youngh/ruby-1.8.5-p12/bin/ruby...done.
(gdb) bt
#0 0x2814a537 in ?? ()
#1 0x28137f71 in ?? ()
#2 0x00000000 in ?? ()
#3 0x00000004 in ?? ()
#4 0x00000006 in ?? ()
#5 0x00000005 in ?? ()
#6 0x28127c00 in ?? ()
#7 0x28127500 in ?? ()
#8 0x28127600 in ?? ()
#9 0x28127700 in ?? ()
#10 0x28127800 in ?? ()
#11 0x28127900 in ?? ()
#12 0x2810256a in ?? ()
#13 0x28127b00 in ?? ()
#14 0x28127c00 in ?? ()
#15 0x00000020 in ?? ()
#16 0x00000000 in ?? ()
#17 0x00000000 in ?? ()
#18 0x00000000 in ?? ()
#19 0x00000000 in ?? ()
#20 0x00000000 in ?? ()
#21 0x00000000 in ?? ()
#22 0x0000000d in ?? ()
#23 0x0000000d in ?? ()
#24 0x28142819 in ?? ()
#25 0x2814d4b4 in ?? ()
#26 0x083b6400 in ?? ()
#27 0xbfbfd9d4 in ?? ()
...
#90 0x00000258 in ?? ()
#91 0x083aefd0 in ?? ()
#92 0x00000001 in ?? ()
#93 0x2814d4b4 in ?? ()
#94 0xbfbfe230 in ?? ()
#95 0x00000002 in ?? ()
#96 0xbfbfded8 in ?? ()
#97 0x080de0be in rb_bug (fmt=0x8116000 "@?%(\025?\233???\020\b")
at error.c:214
Previous frame inner to this frame (corrupt stack?)
(gdb)

-----------------------
### crash in the client process:
### the location given varies per run--this isn't a YAML bug
/home/youngh/ruby-1.8.5-p12/lib/ruby/1.8/yaml/rubytypes.rb:360: [BUG]
rb_gc_mark(): unknown data type 0x20(0x83dffb0) non object
ruby 1.8.5 (2006-12-25) [i386-freebsd6.1]

Abort trap: 6 (core dumped)

--Young

Young Hyun

1/13/2007 1:13:00 AM

On Jan 12, 2007, at 3:32 PM, MenTaLguY wrote:

> On Fri, 2007-01-12 at 05:37 +0900, Young Hyun wrote:
>> On Jan 10, 2007, at 1:39 PM, MenTaLguY wrote:
>>
>> I've given up trying to build Ruby with dmalloc support now that I've
>> learned that MacOS X has built-in support for dmalloc-like memory
>> debugging.
>
> Have you gotten any useful reports from the memory debugging facility?

Yes I have. It showed that huge amounts of memory (500MB in a matter
of minutes) was being used by the realloc() call in
rb_thread_save_context. The call stack is something like

rb_ary_collect (or rb_ary_each in half the cases)
rb_yield
...
rb_callcc
rb_thread_save_context
realloc

(Incidentally, the call sequence rb_thread_schedule ->
rb_thread_save_context wasn't eating up memory.)

I got the same behavior with and without fastthread.

I finally tracked down the memory leak, and it's in
SyncEnumerator#each rather than in any thread synchronization class.
If I refrain from using SyncEnumerator, then my program's memory
usage holds steady at around 33MB. Sorry for the wild goose chase,
but Mutex & company definitely have a bad reputation, and they seemed
the most likely candidates. I still need to investigate why exactly
I'm getting such poor behavior from SyncEnumerator#each. I did
notice that at least one other person has had this problem and
reported it to ruby-talk:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...

>> I don't get these crashes if I run my program without fastthread
>> (but my program can't run for long periods of time without fastthread
>> because of a serious memory leak, so I can't say with 100% certainty
>> that these crashes don't happen without fastthread).
>
> Try replacing your stdlib's thread.rb with the attached, modified
> version which may mitigate the memory leak somewhat (though it won't
> offer fastthread's performance). I'd like to be sure that the crash
> doesn't happen without fastthread.
>
> Assuming it's a fastthread issue, I'm a little suspicious of Queues.
> Where and how does your code use them? If we can narrow it down to a
> specific class, then it is easier to derive a simple test case.

Even though my memory leak problem seems to be resolved, I still want
to help you to diagnose the crash with fastthread. My app makes
heavy use of threading, so having faster thread synchronization would
be useful for me.

I can now say that the crash only appears to happen when I use
fastthread. I mentioned that there are two types of crashes, a
segfault at exit and an "rb_gc_mark(): unknown data type" error. I
can give you some more information about the former kind of crash. I
can easily reproduce it (and I'll try to create a small program to
reproduce it later), and I've run my app with memory corruption
detection turned on in MacOS X's malloc. As far as malloc is
concerned, there are NO heap corruptions, overruns, or underruns. I
even tried with MacOS X's very aggressive libgmalloc (which puts
unwritable virtual memory pages before or after an allocated block),
and also found no heap overruns or underruns.

The segfault at exit happens when mutex objects are finalized by the
GC. Here's what I get in GDB when I start up my app, wait for it to
do a small amount of work (just enough to exercise fastthread a bit),
and then halt it with ^C, forcing the finalizers to run (note that
the app crashes on normal exit() as well, not just when forced to
quit with SIGINT):

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000005
0x001bb2f8 in free_entries (first=0x1) at fastthread.c:74
74 next = first->next;
(gdb) bt
#0 0x001bb2f8 in free_entries (first=0x1) at fastthread.c:74
#1 0x001bb368 in finalize_list (list=0x603a74) at fastthread.c:85
#2 0x001bb870 in finalize_mutex (mutex=0x603a70) at fastthread.c:227
#3 0x001bc550 in finalize_queue (queue=0x603a70) at fastthread.c:562
#4 0x001bc5b4 in free_queue (queue=0x603a70) at fastthread.c:572
#5 0x0002c8bc in rb_gc_call_finalizer_at_exit () at gc.c:1884
#6 0x00005e5c in ruby_finalize_1 () at eval.c:1549
#7 0x00006048 in ruby_cleanup (ex=1) at eval.c:1584
#8 0x00006274 in ruby_stop (ex=6) at eval.c:1615
#9 0x00006348 in ruby_run () at eval.c:1636
#10 0x00002bdc in main (argc=2, argv=0xbffff874, envp=0xbffff880) at
main.c:46
(gdb) info locals
next = (Entry *) 0x0
(gdb) up
#1 0x001bb368 in finalize_list (list=0x603a74) at fastthread.c:85
85 free_entries(list->entry_pool);
(gdb) p *list
$1 = {
entries = 0x6040b0,
last_entry = 0x0,
entry_pool = 0x1,
size = 0
}
(gdb) up
#2 0x001bb870 in finalize_mutex (mutex=0x603a70) at fastthread.c:227
227 finalize_list(&mutex->waiting);
(gdb) p *mutex
$2 = {
owner = 6308016,
waiting = {
entries = 0x6040b0,
last_entry = 0x0,
entry_pool = 0x1,
size = 0
}
}
(gdb) p/x mutex->owner
$3 = 0x6040b0
(gdb) up
#3 0x001bc550 in finalize_queue (queue=0x603a70) at fastthread.c:562
562 finalize_mutex(&queue->mutex);
(gdb) p *queue
$4 = {
mutex = {
owner = 6308016,
waiting = {
entries = 0x6040b0,
last_entry = 0x0,
entry_pool = 0x1,
size = 0
}
},
value_available = {
waiting = {
entries = 0x0,
last_entry = 0x0,
entry_pool = 0x0,
size = 0
}
},
space_available = {
waiting = {
entries = 0x0,
last_entry = 0x0,
entry_pool = 0x0,
size = 0
}
},
values = {
entries = 0x0,
last_entry = 0x0,
entry_pool = 0x0,
size = 0
},
capacity = 0
}
(gdb)

The invalid values in fastthread's mutex object is similar to what
we've seen in the 2nd type of crash ("rb_gc_mark(): unknown data
type"). I'll try to create a test program to reproduce this crash at
exit, and since the corruption appears similar, this test program
should hopefully be useful for diagnosing the 2nd type of crash as well.

--Young

MenTaLguY

1/15/2007 3:28:00 PM

On Sat, 2007-01-13 at 10:13 +0900, Young Hyun wrote:

> The invalid values in fastthread's mutex object is similar to what
> we've seen in the 2nd type of crash ("rb_gc_mark(): unknown data
> type"). I'll try to create a test program to reproduce this crash at
> exit, and since the corruption appears similar, this test program
> should hopefully be useful for diagnosing the 2nd type of crash as well.

I believe the two crashes are related. Also, it appears that the
corruption only happens with Queues, not other uses of Mutexes. Likely
this means that some queue-specific routine expecting a queue is getting
passed a pointer to a member of the queue instead, or a routine
expecting one member is getting passed another.

I'd expect the compiler to catch this sort of thing, and I didn't see
anything obvious auditing the code by hand, but perhaps something's
getting lost in a VALUE cast...

Anyway, it's probably best to focus on Queue when constructing test
cases.

-mental

Young Hyun

1/17/2007 2:02:00 AM

On Jan 15, 2007, at 7:28 AM, MenTaLguY wrote:

> I believe the two crashes are related. Also, it appears that the
> corruption only happens with Queues, not other uses of Mutexes.
> Likely
> this means that some queue-specific routine expecting a queue is
> getting
> passed a pointer to a member of the queue instead, or a routine
> expecting one member is getting passed another.

The program below reproduces the problem, and surprisingly, I only
use Mutex and ConditionVariable--no Queue.

========================================
require 'fastthread'
require 'thread'

class GlobalSpaceMux

def initialize()
@mutex = Mutex.new
@condition = ConditionVariable.new
@queue = Array.new

@send_thread = Thread.new(&method(:send_thread_loop))
end

def send_thread_loop
loop do
@mutex.synchronize do
@condition.wait(@mutex) while @queue.empty?
@queue.shift
end
end
end

end

x = GlobalSpaceMux.new
========================================

$ gdb ~/ruby-1.8.5-p12/bin/ruby
(gdb) r zzz-crash5.rb
Starting program: /Users/youngh/ruby-1.8.5-p12/bin/ruby zzz-crash5.rb
Reading symbols for shared libraries .. done
Reading symbols for shared libraries . done

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000005
0x001772f8 in free_entries (first=0x1) at fastthread.c:74
74 next = first->next;
(gdb) bt
#0 0x001772f8 in free_entries (first=0x1) at fastthread.c:74
#1 0x00177368 in finalize_list (list=0x434614) at fastthread.c:85
#2 0x00177870 in finalize_mutex (mutex=0x434610) at fastthread.c:227
#3 0x00178550 in finalize_queue (queue=0x434610) at fastthread.c:562
#4 0x001785b4 in free_queue (queue=0x434610) at fastthread.c:572
#5 0x0002c8bc in rb_gc_call_finalizer_at_exit () at gc.c:1884
#6 0x00005e5c in ruby_finalize_1 () at eval.c:1549
#7 0x00006048 in ruby_cleanup (ex=0) at eval.c:1584
#8 0x00006274 in ruby_stop (ex=0) at eval.c:1615
#9 0x00006348 in ruby_run () at eval.c:1636
#10 0x00002bdc in main (argc=2, argv=0xbffff780, envp=0xbffff78c) at
main.c:46
(gdb)

--Young

MenTaLguY

1/18/2007 4:11:00 PM

On Wed, 2007-01-17 at 11:02 +0900, Young Hyun wrote:
> The program below reproduces the problem, and surprisingly, I only
> use Mutex and ConditionVariable--no Queue.

I've reduced the test case you gave me down to:

require 'fastthread'
require 'thread'

mutex = Mutex.new
condition = ConditionVariable.new
t = Thread.new do
mutex.lock
condition.wait mutex
end
exit

The problem, in this case, appears to be coping with the mutex and
condition variable getting destroyed at program exit while there is
still a thread waiting on the condition variable.

Simple waits on mutexes do not appear to be affected; for instance, this
does not crash for me:

require 'fastthread'
require 'thread'

mutex = Mutex.new
mutex.lock
t = Thread.new do
mutex.lock
end
exit

I'm currently unsure whether it's related to the other (non-exit-time)
crash or not.

-mental

MenTaLguY

1/18/2007 4:37:00 PM

On Fri, 2007-01-19 at 01:11 +0900, MenTaLguY wrote:
> The problem, in this case, appears to be coping with the mutex and
> condition variable getting destroyed at program exit while there is
> still a thread waiting on the condition variable.

Actually ... that turns out to not be the problem. The thread is killed
and the wait queue empty by the time the condition variable is
finalized.

Bizzarely, what appears to be happening is that rb_queue_alloc is
getting called when creating mutexes and condition variables. Hence
free_queue is getting called when mutexes and condition variables are
destroyed, and chaos ensues.

At least it's a simple problem, but it's a WEIRD one.

-mental

MenTaLguY

1/18/2007 6:32:00 PM

It turns out that the fastthread crash is due to a bug in the Ruby
interpreter's handling of allocator functions for anonymous classes:

http://rubyforge.org/tracker/index.php?func=detail&aid=7974&group_id=426&...

I will try to come up with a workaround.

-mental

comp.lang.ruby

building Ruby with dmalloc

Young Hyun

khaines

MenTaLguY

Young Hyun

Young Hyun

MenTaLguY

Young Hyun

MenTaLguY

MenTaLguY

MenTaLguY

x Login to ForumsZone