Asp Forum - Re: Are all Ruby built-in objects thread safe?

Yukihiro Matsumoto

12/23/2008 2:56:00 PM

Hi,

In message "Re: Are all Ruby built-in objects thread safe?"
on Tue, 23 Dec 2008 18:21:27 +0900, "Just Another Victim of the Ambient Morality" <ihatespam@hotmail.com> writes:

| Are all built-in objects thread safe?

For MRI (1.8.x) and YARV (1.9.x), every C implemented methods are
protected by GIL (Global Interpreter Lock), so that you don't have to
worry about. But it might depend on each implementation.

matz.

13 Answers

Charles Oliver Nutter

12/23/2008 10:28:00 PM

Yukihiro Matsumoto wrote:
> Hi,
>
> In message "Re: Are all Ruby built-in objects thread safe?"
> on Tue, 23 Dec 2008 18:21:27 +0900, "Just Another Victim of the Ambient Morality" <ihatespam@hotmail.com> writes:
>
> | Are all built-in objects thread safe?
>
> For MRI (1.8.x) and YARV (1.9.x), every C implemented methods are
> protected by GIL (Global Interpreter Lock), so that you don't have to
> worry about. But it might depend on each implementation.

In JRuby we've made a best attempt to keep the core data structures like
Array, Hash, and String as thread-safe as possible without introducing
locks (which would slow down single-threaded cases). There are cases
where when using Array or Hash you may get ConcurrencyErrors raised.
This is the trade-off for having fast lock-free collections.

I would very much welcome a set of guaranteed-thread-safe wrapper
collections folks could use if they're concerned about concurrency,
since it would be unreasonable to penalize all code with locking. For
now, learn and love Mutex, or don't share collections across threads.

- Charlie

Robert Dober

12/26/2008 9:43:00 AM

I would very much welcome a set of guaranteed-thread-safe wrapper
> collections folks could use if they're concerned about concurrency, since=
it
> would be unreasonable to penalize all code with locking. For now, learn a=
nd
> love Mutex, or don't share collections across threads.

That sounds like a fun job. The issue goes well with a question which
burns on the top of my tongue:
How to test thread issues? I do not have any experience in this field
but consider it somehow critical before "promising" functional thread
safe wrappers.

Cheers
R.

--=20
Il computer non =E8 una macchina intelligente che aiuta le persone
stupide, anzi, =E8 una macchina stupida che funziona solo nelle mani
delle persone intelligenti.
Computers are not smart to help stupid people, rather they are stupid
and will work only if taken care of by smart people.

Umberto Eco

Robert Klemme

12/26/2008 10:23:00 AM

On 26.12.2008 10:43, Robert Dober wrote:
> I would very much welcome a set of guaranteed-thread-safe wrapper
>> collections folks could use if they're concerned about concurrency, since it
>> would be unreasonable to penalize all code with locking. For now, learn and
>> love Mutex, or don't share collections across threads.
>
> That sounds like a fun job. The issue goes well with a question which
> burns on the top of my tongue:
> How to test thread issues? I do not have any experience in this field
> but consider it somehow critical before "promising" functional thread
> safe wrappers.

This is tricky. There are few things I'd like to say to this: first,
there is an easy way to provide _basic_ thread safety by wrapping all
method calls in a synchronized block - for this even a
SynchronizedDelegator would be sufficient which could be used for _all_
classes - not just collections. In this case, because of the way the
wrapping is done there is no need to test thread safety because - as
long as the delegator ensures that all methods are wrapped in this way
there is no chance of corrupting internal state.

But, in the general case thread safety cannot be achieved on the class
level. My typical example is this

if hash.contains_key? k
dat = hash[k]
else
dat = create_dat(k)
hash[k] = dat
end

The basic property of this bit of code which makes it impossible to
ensure thread safety on level of class Hash is that there are two
methods invoked on the same instance and there must not be any state
change between them because then you either end up with garbage (nil) in
"dat" or you invoke create_dat(k) more than once per key value and this
leads to different threads having a different idea of what hash[k] is.

So in this case you need a lock around the _complete_ block of code.
(Note, the method level lock would work if using a Hash with a default
block, but this solves only a small portion of the cases.) This is also
the reason why a general per method locking is only of limited use. It
only ensures consistency of the internal state of an object but can
never ensure overall concurrency correctness of an application.

Testing thread safety is difficult to impossible for several reasons.
One of the reasons is that for proper control of the test case you would
need to ensure exact timing of thread execution. While you could do
that I have never seen people actually doing this - maybe because this
will make test suites run much slower. Another reason is that you
vastly depend on the underlying machine (i.e. hardware, OS and VM). I
have seen Java programs break as soon as they were executed on a multi
core machine with more than n cores.

Things aren't made easier by the fact that - concluding from postings I
see on comp.lang.java.* newsgroups for example - few people seem to have
a thorough understanding of concurrency and all the issues involved.
Another item on the "makes it hard" list is the fact that most OO and
procedural languages only have a very basic toolkit for concurrency
control; while Java started out pretty good with built in "synchronized"
it took until version 5 that they incorporated Doug Lea's concurrency
utilities into the language's standard library and also into the EJB spec.

It's different in other programming languages: functional languages are
by definition thread safe because they are free of side effects. (At
least in theory. :-)) Also, other languages built for concurrent
applications which have a different programming model (e.g. Occam) of
course have much better support for concurrency.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end

Robert Dober

12/26/2008 6:17:00 PM

On Fri, Dec 26, 2008 at 11:24 AM, Robert Klemme
<shortcutter@googlemail.com> wrote:
> On 26.12.2008 10:43, Robert Dober wrote:
>>
>> I would very much welcome a set of guaranteed-thread-safe wrapper
>>>
>>> collections folks could use if they're concerned about concurrency, sin=
ce
>>> it
>>> would be unreasonable to penalize all code with locking. For now, learn
>>> and
>>> love Mutex, or don't share collections across threads.
>>
>> That sounds like a fun job. The issue goes well with a question which
>> burns on the top of my tongue:
>> How to test thread issues? I do not have any experience in this field
>> but consider it somehow critical before "promising" functional thread
>> safe wrappers.
>
> This is tricky. There are few things I'd like to say to this: first, the=
re
> is an easy way to provide _basic_ thread safety by wrapping all method ca=
lls
> in a synchronized block - for this even a SynchronizedDelegator would be
> sufficient which could be used for _all_ classes - not just collections. =
In
> this case, because of the way the wrapping is done there is no need to te=
st
> thread safety because - as long as the delegator ensures that all methods
> are wrapped in this way there is no chance of corrupting internal state.
>
> But, in the general case thread safety cannot be achieved on the class
> level. My typical example is this
>
> if hash.contains_key? k
> dat =3D hash[k]
> else
> dat =3D create_dat(k)
> hash[k] =3D dat
> end
>
> The basic property of this bit of code which makes it impossible to ensur=
e
> thread safety on level of class Hash is that there are two methods invoke=
d
> on the same instance and there must not be any state change between them
> because then you either end up with garbage (nil) in "dat" or you invoke
> create_dat(k) more than once per key value and this leads to different
> threads having a different idea of what hash[k] is.
>
> So in this case you need a lock around the _complete_ block of code. (Not=
e,
> the method level lock would work if using a Hash with a default block, bu=
t
> this solves only a small portion of the cases.) This is also the reason =
why
> a general per method locking is only of limited use. It only ensures
> consistency of the internal state of an object but can never ensure overa=
ll
> concurrency correctness of an application.
>
> Testing thread safety is difficult to impossible for several reasons. One=
of
> the reasons is that for proper control of the test case you would need to
> ensure exact timing of thread execution. While you could do that I have
> never seen people actually doing this - maybe because this will make test
> suites run much slower. Another reason is that you vastly depend on the
> underlying machine (i.e. hardware, OS and VM). I have seen Java programs
> break as soon as they were executed on a multi core machine with more tha=
n n
> cores.
>
> Things aren't made easier by the fact that - concluding from postings I s=
ee
> on comp.lang.java.* newsgroups for example - few people seem to have a
> thorough understanding of concurrency and all the issues involved. Anothe=
r
> item on the "makes it hard" list is the fact that most OO and procedural
> languages only have a very basic toolkit for concurrency control; while J=
ava
> started out pretty good with built in "synchronized" it took until versio=
n 5
> that they incorporated Doug Lea's concurrency utilities into the language=
's
> standard library and also into the EJB spec.
>
> It's different in other programming languages: functional languages are b=
y
> definition thread safe because they are free of side effects. (At least =
in
> theory. :-)) Also, other languages built for concurrent applications whi=
ch
> have a different programming model (e.g. Occam) of course have much bette=
r
> support for concurrency.
>
> Kind regards
>
> robert
>
>
> --
> remember.guy do |as, often| as.you_can - without end
>
>
Robert, IIUC we do not want either the one, nor the other.
For what I am concerned one would need Read/Write Locks.
The built in methods like [] and []=3D would obtain read and write
locks, while the read lock automatically obtains a write lock and only
releases it when its count is to zero, the write lock would inhibit
the read lock to be obtained.

I do not know if one should allow user access to these locks? My first
thought would be no, all we want to do is to have thread-safe Hash,
Array, String but not to provide advanced synchronization mechanisms.
OTOH one could expose the write_lock and read_lock of each object
which would allow for the following

hash =3D TS_Hash::new
...
dat =3D hash.read_synchronize do
if hash.has_key? key then
hash[ key ]
else
hash.write_synchronize do
hash[key] =3D compute_data( key )
end
end
end
N.B. We have to solve the interlock problem here as normally the
write_synchronize does not obtain the write_lock
as the read_synchronize did get a read_lock. We would need to grant
the write lock if the read_lock is obtained by the current thread
*only*, but imagine two threads waiting for the write_lock while
containing the read_lock, the good old
philosophers-forks-spoon-spaghetti interlock. [ That is why we eat
spaghetti with a fork only ;) ]

Therefore I guess the RW-locking in a ThreadSafe Container class shall
rather not be exposed as we avoiding interlocking is not that
complicated IIRC

And if we expose read_synchronize &blk, and write_synchronize &blk we
should probably raise something like an IllegalMonitorState exception
if a thread tries to call the one inside the other.

More thoughts please.
Cheers
Robert

--=20
Il computer non =E8 una macchina intelligente che aiuta le persone
stupide, anzi, =E8 una macchina stupida che funziona solo nelle mani
delle persone intelligenti.
Computers are not smart to help stupid people, rather they are stupid
and will work only if taken care of by smart people.

Umberto Eco

Simon Krahnke

12/27/2008 8:55:00 AM

* Robert Klemme <shortcutter@googlemail.com> (11:23) schrieb:

> if hash.contains_key? k
> dat = hash[k]
> else
> dat = create_dat(k)
> hash[k] = dat
> end

> [...]

> So in this case you need a lock around the _complete_ block of code.
> (Note, the method level lock would work if using a Hash with a default
> block, but this solves only a small portion of the cases.)

If Hash had a method get_with_default that took a default value or a
block like initialize you could solve these cases.

mfg, simon .... l

Robert Dober

12/27/2008 10:09:00 AM

On Sat, Dec 27, 2008 at 10:04 AM, Simon Krahnke <overlord@gmx.li> wrote:
> * Robert Klemme <shortcutter@googlemail.com> (11:23) schrieb:
>
>> if hash.contains_key? k
>> dat = hash[k]
>> else
>> dat = create_dat(k)
>> hash[k] = dat
>> end
>
>> [...]
>
>> So in this case you need a lock around the _complete_ block of code.
>> (Note, the method level lock would work if using a Hash with a default
>> block, but this solves only a small portion of the cases.)
>
> If Hash had a method get_with_default that took a default value or a
I believe you mean
"get and assign with default"
because IIRC there is a get_with_default, it is simply called
Hash#fetch
R.

Robert Klemme

12/27/2008 10:53:00 AM

On 27.12.2008 09:55, Simon Krahnke wrote:
> * Robert Klemme <shortcutter@googlemail.com> (11:23) schrieb:
>
>> if hash.contains_key? k
>> dat = hash[k]
>> else
>> dat = create_dat(k)
>> hash[k] = dat
>> end
>
>> [...]
>
>> So in this case you need a lock around the _complete_ block of code.
>> (Note, the method level lock would work if using a Hash with a default
>> block, but this solves only a small portion of the cases.)
>
> If Hash had a method get_with_default that took a default value or a
> block like initialize you could solve these cases.

Please read my note: this was just an example for the class of
synchronization problems that cannot be solved within the class. You
can invent arbitrary more complex scenarios that would not make any
sense to be covered by a standard method in class Hash.

My point is totally independent of the functionality of class Hash.
This is about sequences of method invocations which do not tolerate any
state changes by other threads between method calls.

Put it differently: in concurrent applications there are many scenarios
where lock granularity "method" is insufficient to guarantee correct
code. Instead you need explicit locking for larger portions of _client_
code.

Kind regards

robert

Robert Klemme

12/27/2008 11:55:00 AM

On 26.12.2008 19:17, Robert Dober wrote:
> On Fri, Dec 26, 2008 at 11:24 AM, Robert Klemme
> <shortcutter@googlemail.com> wrote:

<snip>full quote</snip>

Please invest a bit more time in quoting and trimming. I may speak for
me only but I find it unnecessary hard to follow a discussion when there
is a full quote and no clear indication of references in your reply.

> Robert, IIUC we do not want either the one, nor the other.

What are you referring to with "one" and "other"?

> For what I am concerned one would need Read/Write Locks.

Why?

> The built in methods like [] and []= would obtain read and write
> locks, while the read lock automatically obtains a write lock and only
> releases it when its count is to zero, the write lock would inhibit
> the read lock to be obtained.

Let's first talk about semantics not jump into implementation. How the
read write locking is implemented is rather unimportant for the question
what kind of additional locking we want to propose for default
collections (if any). As far as I can see three variants are lying on
the table with the lock free collection always be present as well:

1. none at all, locking needs to be explicitly done in client code

2. exclusive locking, no two methods can execute concurrently on the
same instance

3. read write locking with the usual semantics, allowing multiple
methods with read semantic to be executed concurrently or at most one
"write" method at a time. "write" lock excludes any "read" locks.

I'll follow up with discussion at the end.

> I do not know if one should allow user access to these locks? My first
> thought would be no, all we want to do is to have thread-safe Hash,
> Array, String but not to provide advanced synchronization mechanisms.
> OTOH one could expose the write_lock and read_lock of each object
> which would allow for the following
>
> hash = TS_Hash::new
> ..
> dat = hash.read_synchronize do
> if hash.has_key? key then
> hash[ key ]
> else
> hash.write_synchronize do
> hash[key] = compute_data( key )
> end
> end
> end

This is a bad idiom because it is prone to deadlocking. When having
multiple levels of locks you must only go from stronger to weaker locks,
not the other way round. Otherwise this can happen

t1: read lock
t2: read lock
t1: write lock (deadlock, blocked by t2's read lock)

See also
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/locks/ReentrantReadWrit...

> N.B. We have to solve the interlock problem here as normally the
> write_synchronize does not obtain the write_lock
> as the read_synchronize did get a read_lock. We would need to grant
> the write lock if the read_lock is obtained by the current thread
> *only*,

With this definition you force code to obtain a read lock before the
write lock can be held. I assume you mean that the rule should rather
be "grant the write lock only if no other thread holds the read lock" -
which is what rw locking usually does.

> but imagine two threads waiting for the write_lock while
> containing the read_lock, the good old
> philosophers-forks-spoon-spaghetti interlock. [ That is why we eat
> spaghetti with a fork only ;) ]

Exactly, as show above.

> Therefore I guess the RW-locking in a ThreadSafe Container class shall
> rather not be exposed as we avoiding interlocking is not that
> complicated IIRC
>
> And if we expose read_synchronize &blk, and write_synchronize &blk we
> should probably raise something like an IllegalMonitorState exception
> if a thread tries to call the one inside the other.

Not necessarily: read_synchronize inside write_synchronize is perfectly
ok although it has no noticeable effect. But since it can occur in
another method it is reasonable to allow it.

Some more remarks about the complexity of read write locking can be
found here

http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/locks/ReadWrit...

For the interested Wikipedia also has a number of pages on the matter
"locking":

http://en.wikipedia.org/...(computer_science)

Now, what would be reasonable to provide in a thread safe standard
collection? IMHO the discussion above and the complexity and wide
variance in implementation choices for read write locking makes it next
to impossible to come up with an optimal read write lock implementation
in standard collections for most application scenarios.

This yields a bad cost benefit ratio for option 3 above. This leaves us
with options 1 and 2. I lean toward option 1 because even the dead
simple exclusive lock approach is of limited use only. Even in light of
default values or the default_proc of a Hash there are some things to note.

Consider

# assuming Synchronize() wraps an instance with a delegator
# which synchronizes all method calls on a single lock
$all_values = Synchronize(Hash.new {|h,k| h[k] = Synchronize([])})
....
# in multiple threads:
$all_values[some_key] << item

While our basic locking ensures internal state of all collections is
consistent the code has the disadvantage that every thread needs to
obtain two locks and the number of locks is not limited (1 + no of
keys). A single lock would be sufficient here.

Another disadvantage of built in locking is that it might trick some
people in believing that this is all they need to use to get thread safe
code. If there would be no such thing as a default thread safe Array
and Hash people are forced to think about how they want to go about
locking and I believe that way they will get better (i.e. more correct
and more efficient) code.

As a compromise I suggest to provide something like the Synchronize()
method I showed above which does two things: 1. it extends the object
passed with MonitorMixin and 2. it wraps the instance with another
instance which synchronizes all methods like this:

class SynchWrapper
def initialize(o)
@obj = o.extend(::MonitorMixin)
end

# remove all Object methods

def method_missing(*a,&b)
@obj.synchronize { @obj.__send__(*a,&b) }
end
end

Advantage is that there is a single mechanism that uniformly works for
_all_ classes not just collections. And, the mechanism is made explicit
while not requiring too much manual intervention or additional typing.
For all cases where the logic requires other locking schemes explicit
locking needs to be employed anyway.

Kind regards

robert

PS: I just recognize we should probably move this discussion over to
ruby-core...

Robert Dober

12/27/2008 2:22:00 PM

On Sat, Dec 27, 2008 at 12:59 PM, Robert Klemme
<shortcutter@googlemail.com> wrote:
> On 26.12.2008 19:17, Robert Dober wrote:
>>
>> On Fri, Dec 26, 2008 at 11:24 AM, Robert Klemme
>> <shortcutter@googlemail.com> wrote:
>
> <snip>full quote</snip>
>
> Please invest a bit more time in quoting and trimming. I may speak for m=
e
> only but I find it unnecessary hard to follow a discussion when there is =
a
> full quote and no clear indication of references in your reply.
Oh this was bad work, I messed up, my apologies.
>
>> Robert, IIUC we do not want either the one, nor the other.
>
> What are you referring to with "one" and "other"?
IIRC, because I indeed lost context, sorry again, I meant that your
observations, however useful and learned, did not have a direct impact
to what Charles wants here. Look e.g. at the synchronized containers
in Java. They are thread safe and yet all your concerns are still
valid. In other words you are already talking about problems of the
next level of abstraction while the level below still has problems. So
all I wanted to say is that we still should tackle the low level
thread-safety.
>
>> For what I am concerned one would need Read/Write Locks.
>
> Why?
Because we want simultaneous read access. And that was the other point
I referred to in my reply, we do not want to do exclusive reads.
>
>> The built in methods like [] and []=3D would obtain read and write
>> locks, while the read lock automatically obtains a write lock and only
>> releases it when its count is to zero, the write lock would inhibit
>> the read lock to be obtained.
>
> Let's first talk about semantics not jump into implementation. How the r=
ead
> write locking is implemented is rather unimportant for the question what
> kind of additional locking we want to propose for default collections (if
> any). As far as I can see three variants are lying on the table with the
> lock free collection always be present as well:
>
>
> 3. read write locking with the usual semantics, allowing multiple methods
> with read semantic to be executed concurrently or at most one "write" met=
hod
> at a time. "write" lock excludes any "read" locks.
3 by all means
>
>> I do not know if one should allow user access to these locks? My first
>> thought would be no, all we want to do is to have thread-safe Hash,
>> Array, String but not to provide advanced synchronization mechanisms.
>> OTOH one could expose the write_lock and read_lock of each object
>> which would allow for the following
>>
>> hash =3D TS_Hash::new
>> ..
>> dat =3D hash.read_synchronize do
>> if hash.has_key? key then
>> hash[ key ]
>> else
>> hash.write_synchronize do
>> hash[key] =3D compute_data( key )
>> end
>> end
>> end
>
> This is a bad idiom because it is prone to deadlocking.
That was my point to show why I am against exposure of the internal RW lock=

Simon Krahnke

12/28/2008 2:36:00 AM

* Robert Dober <robert.dober@gmail.com> (11:08) schrieb:

> On Sat, Dec 27, 2008 at 10:04 AM, Simon Krahnke <overlord@gmx.li> wrote:
>> * Robert Klemme <shortcutter@googlemail.com> (11:23) schrieb:
>>
>>> if hash.contains_key? k
>>> dat = hash[k]
>>> else
>>> dat = create_dat(k)
>>> hash[k] = dat
>>> end
>>
>>> [...]
>>
>>> So in this case you need a lock around the _complete_ block of code.
>>> (Note, the method level lock would work if using a Hash with a default
>>> block, but this solves only a small portion of the cases.)
>>
>> If Hash had a method get_with_default that took a default value or a
> I believe you mean
> "get and assign with default"
> because IIRC there is a get_with_default, it is simply called
> Hash#fetch

You are right. There is a little difference: fetch doesn't pass the hash
to its block. But one can "abuse" both with an assignment in the block:

| dat = hash.fetch(k) { | key | hash[key] = create_dat(key) }

mfg, simon .... l

comp.lang.ruby

Re: Are all Ruby built-in objects thread safe?

Yukihiro Matsumoto

Charles Oliver Nutter

Robert Dober

Robert Klemme

Robert Dober

Simon Krahnke

Robert Dober

Robert Klemme

Robert Klemme

Robert Dober

Simon Krahnke

x Login to ForumsZone