Asp Forum - Concurrent Ruby?

Kyle Murphy

7/29/2008 4:17:00 AM

Apologies if this is a really stupid question, I am new to programming,
but after reading about Erlang and it's speed increase on multi-core
devices I had to ask.

With Matz supposedly making Ruby 2.0 right now, is it possible to make
it concurrent like Erlang so as to take advantage of the future
multi-core devices? Thank you.
--
Posted via http://www.ruby-....

13 Answers

David Masover

7/29/2008 6:56:00 AM

On Monday 28 July 2008 23:17:22 Kyle Murphy wrote:
> With Matz supposedly making Ruby 2.0 right now, is it possible to make
> it concurrent like Erlang

Not like Erlang, no.

Erlang does a couple of things differently. The most obvious one, which makes
it so scalable, is the message-passing -- Erlang uses "processes" and
message-passing almost as a programming paradigm. We talk
about "Object-Oriented Programming"; Erlang people talk
about "Concurrency-Oriented Programming".

These are much easier to write and scale than threads, and they perform much
better than single threads.

There are a few of us working to rectify this situation, at least
semantically -- there's Revactor, Dramatis, and my own unreleased project
which I've been wasting a few weekend hours on.

Another reason, which I'm running into while working on the above project, is
that Erlang has no mutable data. It even goes so far as to make variables
single-assignment, which is just annoying, but the data structures themselves
are never changed. Take a simple (contrived) Ruby example:

def some_function(options={})
options[:foo] ||= 'Foo'
options[:bar] ||= 'Bar'
options[:foobar] ||= options[:foo] + options[:bar]

some_file.each_line do |line|
line.chomp!
line.gsub! /curses/i, '******'
puts line
end
end

See, we're changing things. Arrays, strings, whatever -- it's actually the
characters inside the string that are changing.

In Erlang, (almost) no data ever changes, you just create new data. Which
means that when you send a message to another process, it's as simple as
sending a pointer across -- which means it's not only a constant-time
operation, it's an absurdly cheap constant-time operation. So the data is
shared, but because it never changes, you don't have to lock it.

Which means that in Erlang, message-passing is so cheap we don't have to worry
about it. If we ported the message-passing to Ruby, it's either unreliable or
it's massively expensive and still somewhat unreliable. I'm not sure there's
a good way around this, though if there is, I intend to find it.

> so as to take advantage of the future
> multi-core devices? Thank you.

This might happen -- maybe, sort of. Keeping all of the above in mind,
threading in Ruby is modeled after the traditional C and Java model, which
means they're probably more expensive to create, and certainly more
dangerous, which means there won't be as many of them.

On top of all that...

Right now, Ruby shares a problem with Python called the GIL -- the Global (or
Giant) Interpreter Lock. What this means is that only one Ruby instruction
may execute at a time. So even though they're using separate OS threads, and
even though different Ruby threads might run on different cores, the speed of
your program (at least the Ruby part) is limited to the speed of a single
core.

The standard response, which you'll probably already see (since I'm taking the
time to write a longer answer), is that you can do threading in two ways:
Either fork off a whole new Ruby process, so you probably can't have any
shared-memory problems -- and/or write the expensive parts in C, and have
your C extension release the Ruby GIL.

(See, you can have more than one bit of C code running in a Ruby program at
once, even alongside all the Ruby stuff -- at least until they need to do
something with Ruby itself.)

There's also JRuby, which uses Java's native threads, and has no GIL. There
have been some problems with them lately, but they should work -- but again,
keep all of the above in mind. You'll be threading as well as Java does, not
as well as Erlang does.

As you can probably tell, I'm not really happy about all of this.

Now, unlike Python, it looks as though the Ruby GIL might eventually be
removed. And there is JRuby. And there's the various actor projects (mine
included). So it's conceivable that we'd get Ruby scalable to arbitrary
numbers of processors.

But again, I suspect Erlang is still going to do it better, if all you care
about is multicore and efficiency. (Ruby is doing a better job of Unicode,
has much more library support, and I much prefer its syntax.)

Florian Gilcher

7/29/2008 11:01:00 AM

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Jul 29, 2008, at 8:55 AM, David Masover wrote:

> a long mail

Nice writeup. You forgot one thing about Erlang, though: It is
(mostly) sideeffect-free while
object orientated languages always rely on sideeffects.
This makes it harder when it comes to concurrency.

Regards,
Skade
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkiO+JAACgkQJA/zY0IIRZYWAQCgjyeagX/cPnHcYZWqgJq4BQSM
HjcAoKAhINdMzbO6tGzjnNoX37J6Oqu9
=P443
-----END PGP SIGNATURE-----

Robert Klemme

7/29/2008 3:03:00 PM

2008/7/29 Florian Gilcher <flo@andersground.net>:
> On Jul 29, 2008, at 8:55 AM, David Masover wrote:
>
>> a long mail
>
> Nice writeup.

Absolutely agree. Thanks David!

> You forgot one thing about Erlang, though: It is (mostly)
> sideeffect-free while

Well, he said that data does not change which is basically the same.

> object orientated languages always rely on sideeffects.

I'd rather say "usually" because immutable classes are quite common.

> This makes it harder when it comes to concurrency.

Obviously.

Kind regards

robert

--
use.inject do |as, often| as.you_can - without end

David Masover

7/29/2008 4:00:00 PM

On Tuesday 29 July 2008 06:01:10 Florian Gilcher wrote:
>
> On Jul 29, 2008, at 8:55 AM, David Masover wrote:
>
> > a long mail
>
> Nice writeup.

Thanks!

> You forgot one thing about Erlang, though: It is
> (mostly) sideeffect-free while
> object orientated languages always rely on sideeffects.

If I understand it right, side effects in Erlang simply take a different form.
Nothing's stopping me from sending random, spurious messages in the middle of
a supposedly-innocuous function.

I did talk about data not being mutable, which provides both a semantic
(lock-free) and a technical advantage (raw speed).

I'm trying to figure out how to at least partly duplicate the semantic
advantage in Ruby, but it's not easy -- I'm stuck either #freeze-ing
everything, or wrapping every message in an actor of its own, and both
approaches seem more obnoxious and error-prone than forcing the developer to
deal with it.

Charles Oliver Nutter

7/29/2008 4:57:00 PM

David Masover wrote:
> There's also JRuby, which uses Java's native threads, and has no GIL. There
> have been some problems with them lately, but they should work -- but again,
> keep all of the above in mind. You'll be threading as well as Java does, not
> as well as Erlang does.

I'm not sure what you mean by problems...there have not been problems
with them lately; they work as you'd expect native threads to work. They
do require a bit more diligence on your part if you're sharing data
across the threads, since for performance reasons we don't do any extra
synchronization of e.g. Array, Hash, String. But native threads work
fine on JRuby.

- Charlie

ara.t.howard

7/29/2008 5:14:00 PM

On Jul 29, 2008, at 10:00 AM, David Masover wrote:

> I'm trying to figure out how to at least partly duplicate the semantic
> advantage in Ruby, but it's not easy -- I'm stuck either #freeze-ing
> everything, or wrapping every message in an actor of its own, and both
> approaches seem more obnoxious and error-prone than forcing the
> developer to
> deal with it.

fan out multiple processes with a message queue each - easy to do with
drb. naive impl:

cfp:~> cat a.rb
b got "hello" (pid=94677)
a got "hello" (pid=94676)

cfp:~> cat a.rb

a =
actor {
recv_msg { |msg|
puts "a got #{ msg.inspect } (pid=#{ Process.pid })"
}
}

b =
actor {
recv_msg { |msg|
puts "b got #{ msg.inspect } (pid=#{ Process.pid })"
a.send_msg msg
}
}

b.send_msg 'hello'

STDIN.gets

BEGIN {

require 'rubygems'
require 'thread'
require 'drb'
require 'slave'

class Actor
include ::DRb::DRbUndumped

def initialize &block
@q = Queue.new
@block = block
act!
end

def act!
@thread = Thread.new do
Thread.current.abort_on_exception = true
instance_eval &@block
end
end

def send_msg message
@q.push message
end

def recv_msg
while(( message = @q.pop ))
yield message
end
end
end

def actor(*a, &b)
Slave.new{ Actor.new(*a, &b) }.object
end

STDOUT.sync = true

}

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama

David Masover

7/30/2008 4:33:00 AM

On Tuesday 29 July 2008 12:13:53 ara.t.howard wrote:
>
> On Jul 29, 2008, at 10:00 AM, David Masover wrote:
>
> > I'm trying to figure out how to at least partly duplicate the semantic
> > advantage in Ruby, but it's not easy -- I'm stuck either #freeze-ing
> > everything, or wrapping every message in an actor of its own, and both
> > approaches seem more obnoxious and error-prone than forcing the
> > developer to
> > deal with it.
>
> fan out multiple processes with a message queue each - easy to do with
> drb.

That implies a full copy (I think), which isn't always what's needed.

Without actually testing your implementation, what happens when I send, say, a
reference to an actor? (Kind of an essential feature.)

And without actually doing any benchmarks (how's that for naive?), I still
find it hard to believe that DRb+Queue would scale better than Thread+Queue,
for large numbers of actors. (Keep in mind, it's not unusual for an Erlang
program to have thousands of processes.)

Given that I still have a vague hope that YARV will eventually remove the GIL,
I'd rather stick to Threads, if I can make them safe.

David Masover

7/30/2008 4:36:00 AM

On Tuesday 29 July 2008 11:56:43 Charles Oliver Nutter wrote:
> David Masover wrote:
> > There's also JRuby, which uses Java's native threads, and has no GIL.
There
> > have been some problems with them lately, but they should work -- but
again,
> > keep all of the above in mind. You'll be threading as well as Java does,
not
> > as well as Erlang does.
>
> I'm not sure what you mean by problems...there have not been problems
> with them lately;

Maybe it wasn't actually "lately".

And there's still the rest of it:

> They
> do require a bit more diligence on your part if you're sharing data
> across the threads,

That's the whole problem that I'm attacking right now -- while a pure actor
model wouldn't share any data, I'm not even sure I can safely clone
everything properly, if I was going that route. And I'd rather not, for
obvious performance reasons.

ara.t.howard

7/30/2008 5:34:00 AM

On Jul 29, 2008, at 10:33 PM, David Masover wrote:

> That implies a full copy (I think), which isn't always what's needed.
>
> Without actually testing your implementation, what happens when I
> send, say, a
> reference to an actor? (Kind of an essential feature.)

DRb handles references. DRbUndumped provides a means to pass
references to remote objects around.

>
>
> And without actually doing any benchmarks (how's that for naive?), I
> still
> find it hard to believe that DRb+Queue would scale better than Thread
> +Queue,
> for large numbers of actors. (Keep in mind, it's not unusual for an
> Erlang
> program to have thousands of processes.)

no doubt that's true. processes can help you now though - especially
since threads don't scale right now in ruby with multi processor
machines.

>
>
> Given that I still have a vague hope that YARV will eventually
> remove the GIL,
> I'd rather stick to Threads, if I can make them safe.

sure, but if you want to burn up processors you simply have to use
processes attm.

you might find this interesting

http://groups.google.com/group/ruby-talk-google/browse_thread/thread/b4e346478eeeead4/0cbc4a86f2237476?lnk=gst&q=threadify+jruby#0cbc4a...

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama

David Masover

7/30/2008 6:02:00 AM

On Wednesday 30 July 2008 00:33:53 ara.t.howard wrote:
>=20
> On Jul 29, 2008, at 10:33 PM, David Masover wrote:
>=20
> > That implies a full copy (I think), which isn't always what's needed.
> >
> > Without actually testing your implementation, what happens when I =20
> > send, say, a
> > reference to an actor? (Kind of an essential feature.)
>=20
> DRb handles references. DRbUndumped provides a means to pass =20
> references to remote objects around.

Alright. What if I send a complex datastructure? Strings, I can live with, =
but=20
what about multidimensional arrays?

> > And without actually doing any benchmarks (how's that for naive?), I =20
> > still
> > find it hard to believe that DRb+Queue would scale better than Thread=20
> > +Queue,
> > for large numbers of actors. (Keep in mind, it's not unusual for an =20
> > Erlang
> > program to have thousands of processes.)
>=20
> no doubt that's true. processes can help you now though - especially =20
> since threads don't scale right now in ruby with multi processor =20
> machines.

I believe work is going on to make Threads scale in 1.9 -- current 1.9 stil=
l=20
has a GIL, though.

They do scale in JRuby, and probably in IronRuby (haven't tried).

> > Given that I still have a vague hope that YARV will eventually =20
> > remove the GIL,
> > I'd rather stick to Threads, if I can make them safe.
>=20
>=20
> sure, but if you want to burn up processors you simply have to use =20
> processes attm.

Or I could use JRuby. Or IronRuby.

I don't want to burn up processors atm. I want to build an architecture whi=
ch=20
will be able to burn up processors in the future. I want to solve concurenc=
y=20
on a single machine once and be done with it -- without having to use Erlan=
g.

>=20
http://groups.google.com/group/ruby-talk-google/browse_thread/thre...
478eeeead4/0cbc4a86f2237476?lnk=3Dgst&q=3Dthreadify+jruby#0cbc4a86f2237476

=46rom that link:

"the sync overhead is prohibitive =20
for in memory stuff"

I am, specifically, interested in doing in-memory stuff. If I can solve tha=
t=20
problem, I'm not as worried about the network stuff, especially as others=20
have already solved that well enough (DRb and friends).

comp.lang.ruby

Concurrent Ruby?

Kyle Murphy

David Masover

Florian Gilcher

Robert Klemme

David Masover

Charles Oliver Nutter

ara.t.howard

David Masover

David Masover

ara.t.howard

David Masover

x Login to ForumsZone