Asp Forum - building extension modules, and linking

jmg3000

9/13/2006 7:09:00 PM

When you load an extension module, what's the mechanism that makes
those C calls (the ones that call into the Ruby API) actually get
connected to the currently-running instance of ruby?

When I build C code (not an extension module) that uses some external
functions -- say, in a shared lib -- the compiler finds the headers at
compile-time. At link-edit time, the compiler checks that the code
fits the shared lib (i.e., makes sure the calls it saw declared in the
headers (and defined in my source) match up with the shared lib's
ABI). At runtime (dynamic link time), the OS hunts down the .so file,
loads it, and gives my program a connection to it. I've got that much.

But when we're building an extension module, how to you do tell GCC
(at link-edit time) that you want your code to link to (at runtime)
what's already loaded and running -- that is, to link to the ruby
interpreter -- rather than to some shared lib somewhere?

Incidentally, I notice that I don't even have a libruby.so anywhere on
my system. I've got an /opt/ruby-1.8.4/lib/libruby-static.a though.
That makes sense to me I suppose, since there's only one app (i.e.,
ruby) that will need to load that library, and you want each instance
of ruby to have its own private memory structures anyway. But I don't
think I'm supposed to link my extension module with that static
library...

Any insights or words of wisdom are most appreciated.

Thanks,
---John

13 Answers

Tim Becker

9/13/2006 7:18:00 PM

> But when we're building an extension module, how to you do tell GCC
> (at link-edit time) that you want your code to link to (at runtime)
> what's already loaded and running -- that is, to link to the ruby
> interpreter -- rather than to some shared lib somewhere?

I think the point you're missing is that the runtime the interpreter
loads your extension, not the other way around. Else the extension
would just pop into existence and then stumble around looking for a
running interpreter.

If you're interested in how the interpreter goes about loading your
extension technically, search around for 'dlopen' (at least on a Linux
system).

-tim

Lyle Johnson

9/13/2006 7:25:00 PM

On 9/13/06, John Gabriele <jmg3000@gmail.com> wrote:

> When you load an extension module, what's the mechanism that makes
> those C calls (the ones that call into the Ruby API) actually get
> connected to the currently-running instance of ruby?

When you use the "require" method to load an extension module, Ruby
takes the name of the feature that you're trying to require and looks
for a shared library of that name somewhere in its library load path.
So for example, if you type:

require 'foobar'

Ruby's going to try to find foobar.so (or foobar.bundle, or whatever's
appropriate for your operating system) somewhere in the $LOAD_PATH.

Ruby uses an OS-specific function call to dynamically load that shared
library into memory, and another OS-specific function call to obtain a
pointer to a function in that library named "Init_foobar". (If you
want more specifics about which functions Ruby users, check out the
dln.c file in the Ruby source code). If Ruby fails to get a pointer to
that Init_foobar() function, the require operation's going to fail.

One Ruby has a pointer to the Init_foobar() function that's defined in
the foobar.so shared library -- it calls it! And that's where you, the
extension writer come in. You are the person who actually defines the
Init_foobar() function for initializing your extension module. In that
function, you should make various calls into Ruby's extension API to
define the modules, classes and methods that make up your extension
module.

Hope this helps,

Lyle

jmg3000

9/13/2006 8:53:00 PM

On 9/13/06, Tim Becker <a2800276@gmail.com> wrote:
> > But when we're building an extension module, how to you do tell GCC
> > (at link-edit time) that you want your code to link to (at runtime)
> > what's already loaded and running -- that is, to link to the ruby
> > interpreter -- rather than to some shared lib somewhere?
>
> I think the point you're missing is that the runtime the interpreter
> loads your extension, not the other way around.

Right. But, before that -- at link-edit time, when I build the
extension module with "gcc -shared", how exactly do I patiently
explain to gcc, "yes there's functions in this code that start with
rb_, yes they're described in ruby.h, no you may not look at the code
they'll be calling at runtime. Sorry."? I guess I'm only familiar with
the common case where you're telling gcc to link up with other libs
(via "-L" to tell it non-standard places to look, and "-l" to specify
the shared libs) -- at link-edit time, doesn't gcc always have to see
the code that *your* code will later be calling?

> Else the extension
> would just pop into existence and then stumble around looking for a
> running interpreter.
>
> If you're interested in how the interpreter goes about loading your
> extension technically, search around for 'dlopen' (at least on a Linux
> system).
>

Yes. I've glanced at dlopen in the past. C code uses it when it wants
to load other libs at runtime. It looks to me like dlopen asks the OS
for a given object, and then the OS hunts around, finds it, loads it,
and then hands back some kind of file pointer to it. So, I'm guessing
ruby uses dlopen to load extension modules.

Thanks,
---John

jmg3000

9/13/2006 9:06:00 PM

On 9/13/06, Lyle Johnson <lyle.johnson@gmail.com> wrote:
> On 9/13/06, John Gabriele <jmg3000@gmail.com> wrote:
>
> > When you load an extension module, what's the mechanism that makes
> > those C calls (the ones that call into the Ruby API) actually get
> > connected to the currently-running instance of ruby?
>
> When you use the "require" method to load an extension module,
> [snip insightful explanation]
> Hope this helps,

Thanks for the explanation Lyle. I'm sorry, but perhaps I was unclear
(and also still not understanding this). I'm looking to find out the
mechanism at work *at link-edit time*, when you're building the
extension module. I mean, what are the args necessary to pass to gcc
to tell it that, when your code makes those rb_foo calls, those calls
are actually supposed to bind to something besides a .so file sitting
on your harddisk?

I can at least guess that, when Ruby loads the extension module, at
that point it can probably do some magic to make sure the C calls in
the extension module actually call code that's already loaded in
memory (from inside the ruby binary itself). But what I'd like to
understand is how to tell gcc this is the way it's going to go happen
at runtime.

Thanks,
---John

Joel VanderWerf

9/13/2006 9:22:00 PM

John Gabriele wrote:
> On 9/13/06, Lyle Johnson <lyle.johnson@gmail.com> wrote:
>> On 9/13/06, John Gabriele <jmg3000@gmail.com> wrote:
>>
>> > When you load an extension module, what's the mechanism that makes
>> > those C calls (the ones that call into the Ruby API) actually get
>> > connected to the currently-running instance of ruby?
>>
>> When you use the "require" method to load an extension module,
>> [snip insightful explanation]
>> Hope this helps,
>
> Thanks for the explanation Lyle. I'm sorry, but perhaps I was unclear
> (and also still not understanding this). I'm looking to find out the
> mechanism at work *at link-edit time*, when you're building the
> extension module. I mean, what are the args necessary to pass to gcc
> to tell it that, when your code makes those rb_foo calls, those calls
> are actually supposed to bind to something besides a .so file sitting
> on your harddisk?

I'm far from an expert on dynamic linking so I can't tell you how the
mechanism really works, but I think what you're talking about is the
-shared option to gcc. From man gcc:

-shared
Produce a shared object which can then be linked with other objects
to form an executable. Not all systems support this option. For
predictable results, you must also specify the same set of options
that were used to generate code (-fpic, -fPIC, or model suboptions)
when you specify this option.[1]

This option is supplied automatically in the Makefile that is generated
when you use the usual extconf.rb and mkmf.rb approach to build an
extension.

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Lyle Johnson

9/14/2006 12:33:00 AM

On 9/13/06, John Gabriele <jmg3000@gmail.com> wrote:

> Thanks for the explanation Lyle. I'm sorry, but perhaps I was unclear
> (and also still not understanding this). I'm looking to find out the
> mechanism at work *at link-edit time*, when you're building the
> extension module. I mean, what are the args necessary to pass to gcc
> to tell it that, when your code makes those rb_foo calls, those calls
> are actually supposed to bind to something besides a .so file sitting
> on your harddisk?

Oh, OK. Well, I can't tell you the *specific* arguments because
they're highly platform-dependent. But the easiest way (IMO) to find
out what they should be is to write an extconf.rb script (a standard
fixture for any Ruby extension) and then run that. I won't go into
detail about the contents of an extconf.rb script, because that's
covered in a number of places (including "Programming Ruby" by Dave
Thomas et al). But the bottom line is that running extconf.rb will
produce a platform-specific Makefile for compiling and linking your
extension module. And once you have that Makefile, you can presumably
back-out the command line arguments that it's using to compile and
link your C code into a shared library.

Vincent Fourmond

9/14/2006 7:25:00 AM

Hello !

> But when we're building an extension module, how to you do tell GCC
> (at link-edit time) that you want your code to link to (at runtime)
> what's already loaded and running -- that is, to link to the ruby
> interpreter -- rather than to some shared lib somewhere?

Well, this is highly platform-dependant. If I understand you right,
you're asking how the system knows where to find the rb_* functions
called from within your C code, even if you didn't link with the
appropriate library ? The answer is relatively simple. The Makefile
produced by extconf.rb contain linker-specific flags to say : "don't
bother to look for missing symbols at link-time, do this at run-time".
What happens next is when your extension is loaded, the dynamic linker
see there are missing symbols, and it looks for them in the current
namespace. From within a ruby interpreter, the symbols are already here
and everything goes fine. But if for some reasons it doesn't find
symbols, your program will crash with an undefined symbol problem.

Just as an example try, with a ruby extension

int main()
{
void * a;
void (* func)();
a = dlopen("ruby_extension.so");
func = dlsym(a, "Init_extension"); /* might already crash here, make
sure to use the right Init_ function */
func(); /* will crash here if not before */
}

If you want to inspect this in more details, I advise you to have a
look at nm and especially the output of nm -D on a shared object (small,
if possible).

Cheers !

Vince

jmg3000

9/14/2006 7:52:00 AM

On 9/13/06, Lyle Johnson <lyle.johnson@gmail.com> wrote:
> On 9/13/06, John Gabriele <jmg3000@gmail.com> wrote:
>
> > Thanks for the explanation Lyle. I'm sorry, but perhaps I was unclear
> > (and also still not understanding this). I'm looking to find out the
> > mechanism at work *at link-edit time*, when you're building the
> > extension module. [snip]
>
> Oh, OK. Well, I can't tell you the *specific* arguments because
> they're highly platform-dependent. But the easiest way (IMO) to find
> out what they should be is to write an extconf.rb script (a standard
> fixture for any Ruby extension) and then run that. [snip]

Ok. I think I partly get it now. Details follow for anyone who's interested.

To save some time, I grabbed the sample extension from
http://www.rubyinside.com/how-to-create-a-ruby-extension-in-c-in-under-5-minute...
and it builds and runs fine (note, I've got Ruby installed in /opt/ruby-1.8.4) :

==== snip ====
module-experiment/MyTest$ ruby extconf.rb
creating Makefile

module-experiment/MyTest$ ls
extconf.rb Makefile MyTest.c

module-experiment/MyTest$ make
gcc -fPIC -g -O2 -I. -I/opt/ruby-1.8.4/lib/ruby/1.8/i686-linux
-I/opt/ruby-1.8.4/lib/ruby/1.8/i686-linux -I. -c MyTest.c
MyTest.c:23:2: warning: no newline at end of file

gcc -shared -L'/opt/ruby-1.8.4/lib' -Wl,-R'/opt/ruby-1.8.4/lib' -o
mytest.so MyTest.o -ldl -lcrypt -lm -lc

module-experiment/MyTest$ ls -lh mytest.so
-rwxr-xr-x 1 john john 8.2K 2006-09-14 02:48 mytest.so

module-experiment/MyTest$ cd ..

module-experiment$ ls
MyTest mytest.rb

module-experiment$ ruby mytest.rb
10
==== /snip ====

The compile command is simple, though contains some harmless redundancy.

The fancy options in the linker command (this is Linux-/GCC-/ELF-specific) are:

* The "-shared" tells the link-editor to build a shared object (a .so file).

* The "-L" simply tells gcc where it can find libraries to link to at
link-edit time. As you can see from the size of mytest.so (8.2 kB),
it's certainly not statically linking in my libruby-static.a. Note
that in the MyTest.c file, there's at least one call to a rb_foo
function, so I'd think that gcc at least needs to *look* at
libruby-static.a...

* The "-Wl,-R..." option means to pass the "-R'/opt/ruby-1.8.4/lib'"
option to the link-editor. Looking it up (see "man ld"), I see that it
means for the link-editor to add the /opt/ruby-1.8.4/lib directory to
the runtime search path, and also, as the docs say: "The -rpath option
is also used when locating shared objects which are needed by shared
objects explicitly included in the link; see the description of the
-rpath-link option."

* Then there's that "-ldl"...

Anyhow, this is where things are still fuzzy for me (though it could
be that it's 3:30 in the morning). How gcc builds mytest.so so it can
later hook into ruby is probably accomplished by some combination of
that -R option, that libdl.so shared lib (which seems to supply
dlopen()), and maybe even the /opt/ruby-1.8.4/lib/libruby-static.a
static lib. Not sure. Anyway, there seems to be some semi-deep
GNU/Linux magic happening here.

Thanks!
---John

jmg3000

9/14/2006 8:09:00 AM

On 9/14/06, Vincent Fourmond <vincent.fourmond@9online.fr> wrote:
>
> Hello !

Hi Vincent. Thanks for the reply.

>
> > But when we're building an extension module, how to you do tell GCC
> > (at link-edit time) that you want your code to link to (at runtime)
> > what's already loaded and running -- that is, to link to the ruby
> > interpreter -- rather than to some shared lib somewhere?
>
> Well, this is highly platform-dependant. If I understand you right,
> you're asking how the system knows where to find the rb_* functions
> called from within your C code, even if you didn't link with the
> appropriate library ? The answer is relatively simple. The Makefile
> produced by extconf.rb contain linker-specific flags to say : "don't
> bother to look for missing symbols at link-time, do this at run-time".

Ohhhhhhhhhhhhhhhhhhhh.

Ok. Hm. Well then. Maybe that's the point of that "-R" link-editor arg
(mentioned in that other post I just made a few minutes ago before
seeing this one). The ld docs on -R/-rpath don't specifically say what
you so clearly express above, but they might *imply* that if read
under the right intensity lights, wearing those cheap bi-colored
3D-movie glasses, while howling under a full-moon... :)

> What happens next is when your extension is loaded, the dynamic linker
> see there are missing symbols, and it looks for them in the current
> namespace. From within a ruby interpreter, the symbols are already here
> and everything goes fine. But if for some reasons it doesn't find
> symbols, your program will crash with an undefined symbol problem.

Got it.

> Just as an example try, with a ruby extension
>
> int main()
> {
> void * a;
> void (* func)();
> a = dlopen("ruby_extension.so");
> func = dlsym(a, "Init_extension"); /* might already crash here, make
> sure to use the right Init_ function */
> func(); /* will crash here if not before */
> }
>
> If you want to inspect this in more details, I advise you to have a
> look at nm and especially the output of nm -D on a shared object (small,
> if possible).
>
> Cheers !
>
> Vince

Thanks again Vince! :)

---John

Vincent Fourmond

9/14/2006 8:17:00 AM

Hi !

Ok. Hm. Well then. Maybe that's the point of that "-R" link-editor arg
> (mentioned in that other post I just made a few minutes ago before
> seeing this one). The ld docs on -R/-rpath don't specifically say what
> you so clearly express above, but they might *imply* that if read
> under the right intensity lights, wearing those cheap bi-colored
> 3D-movie glasses, while howling under a full-moon... :)

Did you try turning your screen upside down ?

> Thanks again Vince! :)

No problem - I got lots of trouble (and experience) with trying to
port some stuff from Linux to MacOS, where the dynamic loader doesn't
function at all the same way... Awful.

Good day to all !

Vince

comp.lang.ruby

building extension modules, and linking

jmg3000

Tim Becker

Lyle Johnson

jmg3000

jmg3000

Joel VanderWerf

Lyle Johnson

Vincent Fourmond

jmg3000

jmg3000

Vincent Fourmond

x Login to ForumsZone