Asp Forum - Rails on Altix ia64

Adam P. Jenkins

5/17/2005 12:53:00 AM

Has anyone successfully run a Rails application on an Altix, or other
ia64 machine? I'm getting a segmentation fault in gc.c. More info below.

I'm using ruby 1.8.2, and Rails 0.12.1. uname -a returns

Linux isc-altix 2.4.21-sgi304r1 #1 SMP Sat Jan 29 22:43:29 PST 2005 ia64
ia64 ia64 GNU/Linux

I built ruby as follows:

env CFLAGS='-O0 -g' ./configure --prefix=$HOME/ruby
make
make test
make install

This all goes fine. I then proceed to also install the rails gems, and
my app. (By the way, it works fine on an ia32 Linux machine.) I then
run webrick, which outputs this:

$ script/server
=> Rails application started on http://0....
[2005-05-16 20:48:32] INFO WEBrick 1.3.1
[2005-05-16 20:48:32] INFO ruby 1.8.2 (2004-12-25) [ia64-linux]
[2005-05-16 20:48:32] INFO WEBrick::HTTPServer#start: pid=4826 port=3000

So far so good. Then I make a request to webrick and get this message:

/home/ajenkins/deploy/lib/ruby/1.8/timeout.rb:40: [BUG] Segmentation fault
ruby 1.8.2 (2004-12-25) [ia64-linux]

Aborted (core dumped)

Examining the core dump in gdb shows:

(gdb) bt
#0 raise (sig=0) at ../nptl/sysdeps/unix/sysv/linux/raise.c:30
#1 0x20000000001a4a40 in abort () at ../sysdeps/generic/abort.c:50
#2 0x40000000002008f0 in rb_bug (fmt=0x40000000002244d8 "Segmentation
fault") at error.c:214
#3 0x4000000000177a10 in sigsegv (sig=11) at signal.c:446
#4 <signal handler called>
#5 gc_mark (ptr=Cannot access memory at address 0xfffffffffffffff0
) at gc.c:713
#6 0x40000000000798f0 in rb_gc_mark (ptr=8) at gc.c:736
#7 0x4000000000050a20 in thread_mark (th=0x60000000008cee90) at eval.c:9705
#8 0x400000000007a5e0 in gc_mark_children (ptr=2305843009237353224,
lev=2) at gc.c:942
#9 0x4000000000079880 in gc_mark (ptr=2305843009237353224, lev=1) at
gc.c:729
#10 0x400000000007a880 in gc_mark_children (ptr=2305843009237353304,
lev=1) at gc.c:974
#11 0x4000000000079880 in gc_mark (ptr=2305843009237353304, lev=0) at
gc.c:729
#12 0x4000000000079060 in mark_locations_array (x=0x60000ffffff59690,
n=82526) at gc.c:624
#13 0x4000000000079150 in rb_gc_mark_locations
(start=0x60000ffffff50f10, end=0x60000fffffffa988)
at gc.c:637
#14 0x400000000007d250 in garbage_collect () at gc.c:1354
#15 0x400000000000f620 in __do_jv_register_classes ()
#16 0x4000000000079650 in gc_mark (ptr=Cannot access memory at address
0xfffffffffffffff0
) at gc.c:712
#17 0x4000000000079650 in gc_mark (ptr=Cannot access memory at address
0xfffffffffffffff0
) at gc.c:712
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
Previous frame identical to this frame (corrupt stack?)
(gdb)

Seems like some data structures in the GC are getting corrupted. Has
anyone else run into this, and found a workaround or fix? Or is it
known not to work on this architecture? Thank you for any information.

Adam

13 Answers

Nakada, Nobuyoshi

5/17/2005 2:24:00 AM

Hi,

At Tue, 17 May 2005 09:55:30 +0900,
Adam P. Jenkins wrote in [ruby-talk:142863]:
> Has anyone successfully run a Rails application on an Altix, or other
> ia64 machine? I'm getting a segmentation fault in gc.c. More info below.

Some bugs about GC on IA64 have been fixed since 1.8.2 release. Try
1.8.3 preview1.

--
Nobu Nakada

Adam P. Jenkins

5/17/2005 2:34:00 AM

Nakada, Nobuyoshi wrote:
> Hi,
>
> At Tue, 17 May 2005 09:55:30 +0900,
> Adam P. Jenkins wrote in [ruby-talk:142863]:
>
>> Has anyone successfully run a Rails application on an Altix, or other
>> ia64 machine? I'm getting a segmentation fault in gc.c. More info
>> below.
>
>
> Some bugs about GC on IA64 have been fixed since 1.8.2 release. Try
> 1.8.3 preview1.
>

Thank you for your quick response. I tried 1.8.3 preview1, and get the
same error. When it happens, in gc_mark, ptr is always == 8. The
function tries to interpret it as a pointer and dereference it.
Here's the result of trying my rails app with 1.8.3:

Start webrick:

$ script/server
=> Rails application started on http://0....
[2005-05-16 22:37:33] INFO WEBrick 1.3.1
[2005-05-16 22:37:33] INFO ruby 1.8.3 (2005-05-12) [ia64-linux]
[2005-05-16 22:37:33] INFO WEBrick::HTTPServer#start: pid=16740 port=3000

Make a request from my browser, and webrick says:

/home/ajenkins/deploy/lib/ruby/1.8/timeout.rb:41: [BUG] Segmentation fault
ruby 1.8.3 (2005-05-12) [ia64-linux]

Aborted (core dumped)

Stack trace from the core dump shows:

(gdb) bt
#0 raise (sig=0) at ../nptl/sysdeps/unix/sysv/linux/raise.c:30
#1 0x20000000001a4a40 in abort () at ../sysdeps/generic/abort.c:50
#2 0x40000000002051c0 in rb_bug (fmt=0x4000000000228f30 "Segmentation
fault") at error.c:214
#3 0x400000000017c040 in sigsegv (sig=11) at signal.c:446
#4 <signal handler called>
#5 gc_mark (ptr=Cannot access memory at address 0xfffffffffffffff0
) at gc.c:715
#6 0x400000000007a9d0 in rb_gc_mark (ptr=8) at gc.c:738
#7 0x40000000000519b0 in thread_mark (th=0x6000000000906d40) at eval.c:9793
#8 0x400000000007b6c0 in gc_mark_children (ptr=2305843009236888272,
lev=2) at gc.c:944
#9 0x400000000007a960 in gc_mark (ptr=2305843009236888272, lev=1) at
gc.c:731
#10 0x400000000007b960 in gc_mark_children (ptr=2305843009236888352,
lev=1) at gc.c:976
#11 0x400000000007a960 in gc_mark (ptr=2305843009236888352, lev=0) at
gc.c:731
#12 0x400000000007a130 in mark_locations_array (x=0x60000ffffff58440,
n=82490) at gc.c:626
#13 0x400000000007a230 in rb_gc_mark_locations
(start=0x60000ffffff50760, end=0x60000fffffff9618)
at gc.c:639
#14 0x400000000007e330 in garbage_collect () at gc.c:1356
#15 0x400000000000f680 in __do_jv_register_classes ()
#16 0x400000000007a730 in gc_mark (ptr=Cannot access memory at address
0xfffffffffffffff0
) at gc.c:714
#17 0x400000000007a730 in gc_mark (ptr=Cannot access memory at address
0xfffffffffffffff0
) at gc.c:714
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
Previous frame identical to this frame (corrupt stack?)
(gdb)

Adam

Adam P. Jenkins

5/17/2005 9:25:00 PM

Adam P. Jenkins wrote:
> Nakada, Nobuyoshi wrote:
>
>> Hi,
>>
>> At Tue, 17 May 2005 09:55:30 +0900,
>> Adam P. Jenkins wrote in [ruby-talk:142863]:
>>
>>> Has anyone successfully run a Rails application on an Altix, or other
>>> ia64 machine? I'm getting a segmentation fault in gc.c. More info
>>> below.
>>
>>
>>
>> Some bugs about GC on IA64 have been fixed since 1.8.2 release. Try
>> 1.8.3 preview1.
>>
>
> Thank you for your quick response. I tried 1.8.3 preview1, and get the
> same error. When it happens, in gc_mark, ptr is always == 8. The
> function tries to interpret it as a pointer and dereference it. Here's
> the result of trying my rails app with 1.8.3:
>
> Start webrick:
>
> $ script/server
> => Rails application started on http://0....
> [2005-05-16 22:37:33] INFO WEBrick 1.3.1
> [2005-05-16 22:37:33] INFO ruby 1.8.3 (2005-05-12) [ia64-linux]
> [2005-05-16 22:37:33] INFO WEBrick::HTTPServer#start: pid=16740 port=3000
>
>
> Make a request from my browser, and webrick says:
>
> /home/ajenkins/deploy/lib/ruby/1.8/timeout.rb:41: [BUG] Segmentation fault
> ruby 1.8.3 (2005-05-12) [ia64-linux]
>
> Aborted (core dumped)

I was able to work around the problem by using a version of Ruby
compiled on an ia32 machine. I just compiled Ruby on a ia32 running
Fedora Core 3, and transfered it to the Altix. Now my Rails app runs
fine on the Altix, albeit much slower than with the natively compiled
Ruby. This is a satisfactory workaround for me for now.

Before realizing I could work around the problem this way, I discovered
some more information about the problem which may be of interest to the
Ruby developers. First of all, to reproduce the problem: (I'm using the
ruby-1.8.3-preview1 tarball.

# Build Ruby on the Altix
$ uname -a
Linux isc-altix 2.4.21-sgi304r1 #1 SMP Sat Jan 29 22:43:29 PST 2005 ia64
ia64 ia64 GNU/Linux
$ cd ruby-1.8.3
$ env CFLAGS='-O0 -g' ./configure --prefix=$DEPLOYDIR
$ make
$ make test
test succeeded
$ make install
$ PATH=$DEPLOYDIR/bin:$PATH

# Install Gem
$ cd ../rubygems-0.8.10
$ ruby setup.rb

# Install Rails
$ gem install --no-rdoc --include-dependencies rails

# Create a test rails app
$ cd ~/
$ rails testapp

# Run the app

$ testapp/script/server
=> Rails application started on http://0....
[2005-05-17 15:40:02] INFO WEBrick 1.3.1
[2005-05-17 15:40:02] INFO ruby 1.8.3 (2005-05-12) [ia64-linux]
[2005-05-17 15:40:02] INFO WEBrick::HTTPServer#start: pid=4836 port=3000

Make a request to http://localhost:300 from a browser, and webrick outputs:

/home/ajenkins/deploy/lib/ruby/1.8/timeout.rb:41: [BUG] Segmentation fault
ruby 1.8.3 (2005-05-12) [ia64-linux]

Aborted

The code at timeout.rb:41 is a call to Thread.start. I tried modifying
timeout to not spawn a thread, and then I'd get segfaults elsewhere,
always at a call to Thread.start. So it seems to have something to do
with threading, since single threaded code seems to work fine. When I
look at the core dumps, the segfault is always in gc_mark(), at
gc.c:715. The ptr parameter to gc_mark always has the value 8. gc_mark
tries to treat this as a pointer, and gets a segfault. I didn't
understand the code in gc.c enough to get much farther in figuring out
how this value got there.

The start of gc_mark() looks like this:

void
gc_mark(ptr, lev)
VALUE ptr;
int lev;
{
register RVALUE *obj;

obj = RANY(ptr);
if (rb_special_const_p(ptr)) return; /* special const not marked */
if (obj->as.basic.flags == 0) return; /* free cell */

The call to rb_special_const_p returns true if either of the bottom two
bits of ptr are set. The code above seems to assume that if the bottom
two bits are cleared, then ptr is really a pointer value. So when
gc_mark is called with the value 8 for its ptr parameter, it goes on to
execute the next line, which dereferences obj, and causes a segfault.

Hope this helps a little.
Adam

Yukihiro Matsumoto

6/5/2005 12:17:00 PM

Hi,

Sorry for being late.

In message "Re: Rails on Altix ia64"
on Wed, 18 May 2005 06:25:31 +0900, "Adam P. Jenkins" <ajenkins@interactivesupercomputing.com> writes:

|/home/ajenkins/deploy/lib/ruby/1.8/timeout.rb:41: [BUG] Segmentation fault
|ruby 1.8.3 (2005-05-12) [ia64-linux]
|Aborted
|
|The code at timeout.rb:41 is a call to Thread.start. I tried modifying
|timeout to not spawn a thread, and then I'd get segfaults elsewhere,
|always at a call to Thread.start. So it seems to have something to do
|with threading, since single threaded code seems to work fine. When I
|look at the core dumps, the segfault is always in gc_mark(), at
|gc.c:715. The ptr parameter to gc_mark always has the value 8. gc_mark
|tries to treat this as a pointer, and gets a segfault. I didn't
|understand the code in gc.c enough to get much farther in figuring out
|how this value got there.

The ptr value should not be 8 in any case. It must be caused by a bug
somewhere. Can you show us stack trace (by gdb's "where" command)?

matz.

Tanaka Akira

6/7/2005 11:07:00 AM

In article <t9GdnaIzKMnk3RTfRVn-uw@rcn.net>,
"Adam P. Jenkins" <ajenkins@interactivesupercomputing.com> writes:

> Has anyone successfully run a Rails application on an Altix, or other
> ia64 machine? I'm getting a segmentation fault in gc.c. More info below.

I'd like to know following patch fix your problem or not.

Index: eval.c
===================================================================
RCS file: /src/ruby/eval.c,v
retrieving revision 1.616.2.98
diff -u -r1.616.2.98 eval.c
--- eval.c 25 May 2005 23:09:05 -0000 1.616.2.98
+++ eval.c 7 Jun 2005 11:02:32 -0000
@@ -111,7 +111,27 @@
abort(); /* ensure noreturn */
}
#define longjmp(env, val) rb_jump_context(env, val)
-#define setjmp(j) ((j)->status = 0, getcontext(&(j)->context), (j)->status)
+#define callee_save_registers_may_be_breaked_here + __asm__ volatile ("" : : : + "in0", "in1", "in2", "in3", "in4", "in5", "in6", "in7", + "loc0", "loc1", "loc2", "loc3", "loc4", "loc5", "loc6", "loc7", + "loc8", "loc9", "loc10","loc11","loc12","loc13","loc14","loc15", + "loc16","loc17","loc18","loc19","loc20","loc21","loc22","loc23", + "loc24","loc25","loc26","loc27","loc28","loc29","loc30","loc31", + "loc32","loc33","loc34","loc35","loc36","loc37","loc38","loc39", + "loc40","loc41","loc42","loc43","loc44","loc45","loc46","loc47", + "loc48","loc49","loc50","loc51","loc52","loc53","loc54","loc55", + "loc56","loc57","loc58","loc59","loc60","loc61","loc62","loc63", + "loc64","loc65","loc66","loc67","loc68","loc69","loc70","loc71", + "loc72","loc73","loc74","loc75","loc76","loc77","loc78","loc79", + "out0", "out1", "out2", "out3", "out4", "out5", "out6", "out7");
+#define setjmp(j) ({ + ucontext_t *ucp; + (j)->status = 0; + ucp = &(j)->context; + callee_save_registers_may_be_breaked_here; + getcontext(ucp); + (j)->status; })
#else
typedef jmp_buf rb_jmpbuf_t;
#ifndef setjmp
@@ -9918,8 +9938,10 @@
{
ucontext_t ctx;
VALUE *top, *bot;
+ ucontext_t *ctxp = &ctx;

- getcontext(&ctx);
+ callee_save_registers_may_be_breaked_here;
+ getcontext(ctxp);
bot = (VALUE*)__libc_ia64_register_backing_store_base;
#if defined(__FreeBSD__)
top = (VALUE*)ctx.uc_mcontext.mc_special.bspstore;

I know the patch is very ugly, gcc&IA64 depended, and it tickle
gcc-3.3.5 optimization bug.
--
Tanaka Akira

Adam P. Jenkins

6/8/2005 6:02:00 AM

Yukihiro Matsumoto wrote:
> Hi,
>
> Sorry for being late.
>
> In message "Re: Rails on Altix ia64"
> on Wed, 18 May 2005 06:25:31 +0900, "Adam P. Jenkins" <ajenkins@interactivesupercomputing.com> writes:
>
> |/home/ajenkins/deploy/lib/ruby/1.8/timeout.rb:41: [BUG] Segmentation fault
> |ruby 1.8.3 (2005-05-12) [ia64-linux]
> |Aborted
> |
> |The code at timeout.rb:41 is a call to Thread.start. I tried modifying
> |timeout to not spawn a thread, and then I'd get segfaults elsewhere,
> |always at a call to Thread.start. So it seems to have something to do
> |with threading, since single threaded code seems to work fine. When I
> |look at the core dumps, the segfault is always in gc_mark(), at
> |gc.c:715. The ptr parameter to gc_mark always has the value 8. gc_mark
> |tries to treat this as a pointer, and gets a segfault. I didn't
> |understand the code in gc.c enough to get much farther in figuring out
> |how this value got there.
>
> The ptr value should not be 8 in any case. It must be caused by a bug
> somewhere. Can you show us stack trace (by gdb's "where" command)?
>
> matz.

I did post a couple of stack traces earlier in this thread. They were
posted on 5/16/2005. Here it is again. Note that in stack frame #6,
the ptr parameter == 8, but in stack frame #5, it's somehow become
0xfffff.... When I ran the program under gdb instead of examining the
core dump afterward, the value would still be 8 when gc_mark was called.

Adam

/home/ajenkins/deploy/lib/ruby/1.8/timeout.rb:41: [BUG] Segmentation fault
ruby 1.8.3 (2005-05-12) [ia64-linux]

Aborted (core dumped)

Stack trace from the core dump shows:

(gdb) bt
#0 raise (sig=0) at ../nptl/sysdeps/unix/sysv/linux/raise.c:30
#1 0x20000000001a4a40 in abort () at ../sysdeps/generic/abort.c:50
#2 0x40000000002051c0 in rb_bug (fmt=0x4000000000228f30 "Segmentation
fault") at error.c:214
#3 0x400000000017c040 in sigsegv (sig=11) at signal.c:446
#4 <signal handler called>
#5 gc_mark (ptr=Cannot access memory at address 0xfffffffffffffff0
) at gc.c:715
#6 0x400000000007a9d0 in rb_gc_mark (ptr=8) at gc.c:738
#7 0x40000000000519b0 in thread_mark (th=0x6000000000906d40) at eval.c:9793
#8 0x400000000007b6c0 in gc_mark_children (ptr=2305843009236888272,
lev=2) at gc.c:944
#9 0x400000000007a960 in gc_mark (ptr=2305843009236888272, lev=1) at
gc.c:731
#10 0x400000000007b960 in gc_mark_children (ptr=2305843009236888352,
lev=1) at gc.c:976
#11 0x400000000007a960 in gc_mark (ptr=2305843009236888352, lev=0) at
gc.c:731
#12 0x400000000007a130 in mark_locations_array (x=0x60000ffffff58440,
n=82490) at gc.c:626
#13 0x400000000007a230 in rb_gc_mark_locations
(start=0x60000ffffff50760, end=0x60000fffffff9618)
at gc.c:639
#14 0x400000000007e330 in garbage_collect () at gc.c:1356
#15 0x400000000000f680 in __do_jv_register_classes ()
#16 0x400000000007a730 in gc_mark (ptr=Cannot access memory at address
0xfffffffffffffff0
) at gc.c:714
#17 0x400000000007a730 in gc_mark (ptr=Cannot access memory at address
0xfffffffffffffff0
) at gc.c:714
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
Previous frame identical to this frame (corrupt stack?)
(gdb)

Adam P. Jenkins

6/8/2005 7:15:00 PM

Tanaka Akira wrote:
> In article <t9GdnaIzKMnk3RTfRVn-uw@rcn.net>,
> "Adam P. Jenkins" <ajenkins@interactivesupercomputing.com> writes:
>
>
>>Has anyone successfully run a Rails application on an Altix, or other
>>ia64 machine? I'm getting a segmentation fault in gc.c. More info below.
>
>
> I'd like to know following patch fix your problem or not.
>
> Index: eval.c
> ===================================================================
> RCS file: /src/ruby/eval.c,v
> retrieving revision 1.616.2.98
> diff -u -r1.616.2.98 eval.c
> --- eval.c 25 May 2005 23:09:05 -0000 1.616.2.98
> +++ eval.c 7 Jun 2005 11:02:32 -0000
> @@ -111,7 +111,27 @@
> abort(); /* ensure noreturn */
> }
> #define longjmp(env, val) rb_jump_context(env, val)
> -#define setjmp(j) ((j)->status = 0, getcontext(&(j)->context), (j)->status)
> +#define callee_save_registers_may_be_breaked_here > + __asm__ volatile ("" : : : > + "in0", "in1", "in2", "in3", "in4", "in5", "in6", "in7", > + "loc0", "loc1", "loc2", "loc3", "loc4", "loc5", "loc6", "loc7", > + "loc8", "loc9", "loc10","loc11","loc12","loc13","loc14","loc15", > + "loc16","loc17","loc18","loc19","loc20","loc21","loc22","loc23", > + "loc24","loc25","loc26","loc27","loc28","loc29","loc30","loc31", > + "loc32","loc33","loc34","loc35","loc36","loc37","loc38","loc39", > + "loc40","loc41","loc42","loc43","loc44","loc45","loc46","loc47", > + "loc48","loc49","loc50","loc51","loc52","loc53","loc54","loc55", > + "loc56","loc57","loc58","loc59","loc60","loc61","loc62","loc63", > + "loc64","loc65","loc66","loc67","loc68","loc69","loc70","loc71", > + "loc72","loc73","loc74","loc75","loc76","loc77","loc78","loc79", > + "out0", "out1", "out2", "out3", "out4", "out5", "out6", "out7");
> +#define setjmp(j) ({ > + ucontext_t *ucp; > + (j)->status = 0; > + ucp = &(j)->context; > + callee_save_registers_may_be_breaked_here; > + getcontext(ucp); > + (j)->status; })
> #else
> typedef jmp_buf rb_jmpbuf_t;
> #ifndef setjmp
> @@ -9918,8 +9938,10 @@
> {
> ucontext_t ctx;
> VALUE *top, *bot;
> + ucontext_t *ctxp = &ctx;
>
> - getcontext(&ctx);
> + callee_save_registers_may_be_breaked_here;
> + getcontext(ctxp);
> bot = (VALUE*)__libc_ia64_register_backing_store_base;
> #if defined(__FreeBSD__)
> top = (VALUE*)ctx.uc_mcontext.mc_special.bspstore;
>
>
> I know the patch is very ugly, gcc&IA64 depended, and it tickle
> gcc-3.3.5 optimization bug.

Thank you very much! I just tried applying this patch, and now the
crash does not occur.

Thank you,
Adam

Tanaka Akira

6/9/2005 6:59:00 AM

In article <SpednduBRbQs3jrfRVn-qA@rcn.net>,
"Adam P. Jenkins" <thorin@theshire.com> writes:

> Thank you very much! I just tried applying this patch, and now the
> crash does not occur.

I see. I think I understand why ruby is so unstable on IA64, now.

This is updated patch which use magic setjmp to avoid the problem.

It makes SEGV much rare.
I got only 2 SEGVs in 100 test-all with ruby built with -O2.

Index: eval.c
===================================================================
RCS file: /src/ruby/eval.c,v
retrieving revision 1.616.2.99
diff -u -r1.616.2.99 eval.c
--- eval.c 7 Jun 2005 23:33:50 -0000 1.616.2.99
+++ eval.c 9 Jun 2005 04:08:08 -0000
@@ -32,9 +32,8 @@
#if defined(HAVE_GETCONTEXT) && defined(HAVE_SETCONTEXT)
#include <ucontext.h>
#define USE_CONTEXT
-#else
-#include <setjmp.h>
#endif
+#include <setjmp.h>

#include "st.h"
#include "dln.h"
@@ -98,8 +97,6 @@
volatile int status;
} rb_jmpbuf_t[1];

-#undef longjmp
-#undef setjmp
NORETURN(static void rb_jump_context(rb_jmpbuf_t, int));
static inline void
rb_jump_context(env, val)
@@ -110,15 +107,45 @@
setcontext(&env->context);
abort(); /* ensure noreturn */
}
-#define longjmp(env, val) rb_jump_context(env, val)
-#define setjmp(j) ((j)->status = 0, getcontext(&(j)->context), (j)->status)
+/*
+ * DUMMY_SETJMP is a magic for getcontext, gcc and IA64 register stack
+ * combination problem.
+ *
+ * Assume following code sequence.
+ *
+ * 1. set a register in the register stack such as r32.
+ * 2. call getcontext.
+ * 3. use the register.
+ * 4. update the register for other use.
+ * 5. call setcontext directly or indirectly.
+ *
+ * This code should be run as 1->2->3->4->5->3->4.
+ * But after second getcontext return (second 3),
+ * the register is broken (updated).
+ * It's because getcontext/setcontext doesn't preserve the content of the
+ * register stack.
+ *
+ * setjmp also doesn't preserve the content of the register stack.
+ * But it has not the problem because gcc knows setjmp may return twice.
+ * gcc detects setjmp and generates setjmp safe code.
+ *
+ * So setjmp call before getcontext call fix the problem.
+ * It is not required that setjmp is called at run time, since the problem is
+ * register usage.
+ */
+jmp_buf dummy_jmp_buf;
+int dummy_setjmp_false = 0;
+#define DUMMY_SETJMP (dummy_setjmp_false ? setjmp(dummy_jmp_buf) : 0)
+#define ruby_longjmp(env, val) rb_jump_context(env, val)
+#define ruby_setjmp(j) ((j)->status = 0, DUMMY_SETJMP, getcontext(&(j)->context), (j)->status)
#else
typedef jmp_buf rb_jmpbuf_t;
-#ifndef setjmp
-#ifdef HAVE__SETJMP
-#define setjmp(env) _setjmp(env)
-#define longjmp(env,val) _longjmp(env,val)
-#endif
+#if !defined(setjmp) && defined(HAVE__SETJMP)
+#define ruby_setjmp(env) _setjmp(env)
+#define ruby_longjmp(env,val) _longjmp(env,val)
+#else
+#define ruby_setjmp(env) setjmp(env)
+#define ruby_longjmp(env,val) longjmp(env,val)
#endif
#endif

@@ -927,12 +954,12 @@
#define PROT_LAMBDA INT2FIX(2) /* 5 */
#define PROT_YIELD INT2FIX(3) /* 7 */

-#define EXEC_TAG() (FLUSH_REGISTER_WINDOWS, setjmp(prot_tag->buf))
+#define EXEC_TAG() (FLUSH_REGISTER_WINDOWS, ruby_setjmp(prot_tag->buf))

#define JUMP_TAG(st) do { ruby_frame = prot_tag->frame; ruby_iter = prot_tag->iter; - longjmp(prot_tag->buf,(st)); + ruby_longjmp(prot_tag->buf,(st)); } while (0)

#define POP_TAG() @@ -10006,7 +10033,7 @@

#define THREAD_SAVE_CONTEXT(th) (rb_thread_save_context(th),- rb_thread_switch((FLUSH_REGISTER_WINDOWS, setjmp((th)->context))))
+ rb_thread_switch((FLUSH_REGISTER_WINDOWS, ruby_setjmp((th)->context))))

NORETURN(static void rb_thread_restore_context _((rb_thread_t,int)));

@@ -10087,7 +10114,7 @@
rb_backref_set(tmp->last_match);
tmp->last_match = tval;

- longjmp(tmp->context, ex);
+ ruby_longjmp(tmp->context, ex);
}

static void
Index: gc.c
===================================================================
RCS file: /src/ruby/gc.c,v
retrieving revision 1.168.2.18
diff -u -r1.168.2.18 gc.c
--- gc.c 20 Jan 2005 09:34:36 -0000 1.168.2.18
+++ gc.c 9 Jun 2005 04:08:08 -0000
@@ -1483,14 +1483,6 @@
STACK_LEVEL_MAX = (rlim.rlim_cur - space) / sizeof(VALUE);
}
}
-#if defined(__ia64__) && (!defined(__GNUC__) || __GNUC__ < 2 || defined(__OPTIMIZE__))
- /* ruby crashes on IA64 if compiled with optimizer on */
- /* when if STACK_LEVEL_MAX is greater than this magic number */
- /* I know this is a kludge. I suspect optimizer bug */
-#define IA64_MAGIC_STACK_LIMIT 49152
- if (STACK_LEVEL_MAX > IA64_MAGIC_STACK_LIMIT)
- STACK_LEVEL_MAX = IA64_MAGIC_STACK_LIMIT;
-#endif
#endif
}

--
Tanaka Akira

Yukihiro Matsumoto

6/9/2005 8:56:00 AM

Hi,

In message "Re: Rails on Altix ia64"
on Thu, 9 Jun 2005 15:59:27 +0900, Tanaka Akira <akr@m17n.org> writes:

|> Thank you very much! I just tried applying this patch, and now the
|> crash does not occur.
|
|I see. I think I understand why ruby is so unstable on IA64, now.
|This is updated patch which use magic setjmp to avoid the problem.

Can you commit?

matz.

Tanaka Akira

6/9/2005 12:36:00 PM

In article <1118307336.315169.24849.nullmailer@x31.priv.netlab.jp>,
Yukihiro Matsumoto <matz@ruby-lang.org> writes:

> Can you commit?

committed.

However I think Ruby has other problems with IA64.

For example, Ruby doesn't flush register stack.
(Boehm GC does.)
--
Tanaka Akira

comp.lang.ruby

Rails on Altix ia64

Adam P. Jenkins

Nakada, Nobuyoshi

Adam P. Jenkins

Adam P. Jenkins

Yukihiro Matsumoto

Tanaka Akira

Adam P. Jenkins

Adam P. Jenkins

Tanaka Akira

Yukihiro Matsumoto

Tanaka Akira

x Login to ForumsZone