Asp Forum - DRb and Thread safety

Erwin Abbott

5/31/2007 2:30:00 AM

I've been reading about DRb and one issue that I still have questions
about is thread safety. Every website I've come across uses very
basic examples where the client never does anything that would modify
the distributed object.

For sake of discussion, let's say I'm writing an application that's
some kind of database. Clients can add, delete, search, or modify
records. How can this be written so clients can't step on each other,
or block the whole database while perfoming and update on a single
record? Having a mutex for every record doesn't seem very elegant.

I'm also still getting aquainted with threads, and I want to be sure I
have this right: each thread has it's own "local" variables, so if two
threads have access to the same array and they both call array.max...
any local vars used in the #max method won't be "stepped on" by the
other thread. As long as #max doesn't change any variables outside its
method scope, it will be thread safe?

Thanks,
Erwin

9 Answers

Eric Hodel

5/31/2007 3:45:00 AM

On May 30, 2007, at 19:29, Erwin Abbott wrote:

> I've been reading about DRb and one issue that I still have questions
> about is thread safety. Every website I've come across uses very
> basic examples where the client never does anything that would modify
> the distributed object.
>
> For sake of discussion, let's say I'm writing an application that's
> some kind of database. Clients can add, delete, search, or modify
> records. How can this be written so clients can't step on each other,
> or block the whole database while perfoming and update on a single
> record? Having a mutex for every record doesn't seem very elegant.

Having your database be a Rinda::TupleSpace is one way around it.

> I'm also still getting aquainted with threads, and I want to be sure I
> have this right: each thread has it's own "local" variables, so if two
> threads have access to the same array and they both call array.max...
> any local vars used in the #max method won't be "stepped on" by the
> other thread. As long as #max doesn't change any variables outside its
> method scope, it will be thread safe?

Correct.

Brian Candler

5/31/2007 7:12:00 AM

On Thu, May 31, 2007 at 11:29:46AM +0900, Erwin Abbott wrote:
> I've been reading about DRb and one issue that I still have questions
> about is thread safety. Every website I've come across uses very
> basic examples where the client never does anything that would modify
> the distributed object.

There's some examples which do this at
http://wiki.rubygarden.org/Ruby/page/show/D...

Unfortunately Rubygarden is down more often than it's up (it's down as I
write this), but you can get a cached version through Google.

> For sake of discussion, let's say I'm writing an application that's
> some kind of database. Clients can add, delete, search, or modify
> records. How can this be written so clients can't step on each other,
> or block the whole database while perfoming and update on a single
> record?

Well, that's the general locking problem in a nutshell, and many books have
been written about this :-)

The solution will be domain-specific. The simplest solution is to serialise
all requests on the server side using a global mutex:

class DatabaseServer
def initialize
@mutex = Mutex.new
end

def add(r)
@mutex.synchronize do
# stuff
end
end

def search(r)
@mutex.synchronize do
# stuff
end
end

# etc
end

This is a reasonable approach if all the actions like add, search are
CPU-bound, because having concurrency here won't gain you anything.

If some of these actions can block on external I/O, then you might want to
have more concurrency. So you can look at having a shared read lock /
exclusive write lock kind of model; or you can push the problem downstream
(that is, write your app using some pre-existing thread-safe backend store,
and then just let multiple clients run your code concurrently)

B.

Erwin Abbott

5/31/2007 8:04:00 AM

On 5/30/07, Eric Hodel <drbrain@segment7.net> wrote:
> Having your database be a Rinda::TupleSpace is one way around it.

Hi Eric, I was hoping you would respond (I came across your site and a
few discussions on ruby-talk while researching DRb)! From what I've
read about Rinda::TupleSpace, it seems similar to a Queue. The
difference, as far as I understand, is the idea of a template that
lets us choose to only pop certain kinds of items off the queue
(instead of maintaining several different queues). Maybe you can fill
me in if I overlooked something, I'm not familiar with Linda so the
whole idea is a bit foreign to me.

I'm probably just not developed the mindset for using TupleSpaces yet,
but I've only seen examples where some data is sent off to some
"worker bee" client, and then it's written back to the TupleSpace. I'm
pretty sure I understand those examples but I can't figure out how
this will apply to my application.

I'm hoping to be able to do something like db.search(params) => [a, b,
c, d, ...] where this is an array of records. Then calling methods
that modify the record like a.delete! or a.name = 'Harry' would take
care of locking the record. Calling db.store Record.new(1, 2, 3) would
add a new record to the database. Of course "db" is the object
connected with DRb.

Maybe you can give a brief explanation of how to use Rinda::TupleSpace
in this case? I can maybe see records being stored on the TupleSpace
when they are updated, but I don't really get it yet. Would searches
be done using ts.read using the template to provide search parameters?
I was thinking the "database" object would be more abstract than a big
table of records, so comparing "fields" with == or === might not be
flexible enough (that's how the templates are matched, right?). Maybe
it's an address book, and there's a method to find customers that
haven't payed their bills from last month?

- Erwin

Erwin Abbott

5/31/2007 8:32:00 AM

On 5/31/07, Brian Candler <B.Candler@pobox.com> wrote:
> There's some examples which do this at
> http://wiki.rubygarden.org/Ruby/page/show/D...
>
> Unfortunately Rubygarden is down more often than it's up (it's down as I
> write this), but you can get a cached version through Google.

Glad you mentioned the Google cache... I kept finding links to that
page and wondered what I was missing. I'll be reading that next.

> Well, that's the general locking problem in a nutshell, and many books have
> been written about this :-)

I'm very interested in a book on programming with threads, any
recommendations? I'll probably browse the bookstore this weekend. I'm
afraid everything will be focused on specifics like language/OS rather
than general concepts. It's also hard to know what's good from reading
reviews on Amazon.

> The solution will be domain-specific. The simplest solution is to serialise
> all requests on the server side using a global mutex:
>
> This is a reasonable approach if all the actions like add, search are
> CPU-bound, because having concurrency here won't gain you anything.

The reason for using synchronization in #search is because we might
start in the middle of a #write call from another thread, right? I had
something in mind like an SQL database which would search using the
"not-yet-written" version of the data... but now I think I see that's
quite complicated.

I also think I just came to a realization. In the threading examples
I've seen, there's typically a "sleep 5" as a placeholder for some big
computation. But this simulates an I/O bound process instead of
something computationally intesive. So processing 10 items each with
their own thread (which just sleeps for 5 seconds) will finish in
close to 5 seconds... a huge improvement over processing them
serially. But processing the items was actually CPU intensive, would
it be more like 50 seconds?

> If some of these actions can block on external I/O, then you might want to
> have more concurrency. So you can look at having a shared read lock /
> exclusive write lock kind of model; ...

I'd like to learn the design concepts involved with that, just to gain
an appreciation for the complexity of it. Is my best bet to look at
the source code of something like an SQL server? Maybe one of the
Berkeley DB variants or SQLite would be a good starting point (or
maybe a multiuser DB if these aren't multithreaded). Thanks for your
help!

Erwin

Marcin Raczkowski

5/31/2007 9:43:00 AM

On Thursday 31 May 2007 02:29, Erwin Abbott wrote:
> I've been reading about DRb and one issue that I still have questions
> about is thread safety. Every website I've come across uses very
> basic examples where the client never does anything that would modify
> the distributed object.
>
> For sake of discussion, let's say I'm writing an application that's
> some kind of database. Clients can add, delete, search, or modify
> records. How can this be written so clients can't step on each other,
> or block the whole database while perfoming and update on a single
> record? Having a mutex for every record doesn't seem very elegant.
>
> I'm also still getting aquainted with threads, and I want to be sure I
> have this right: each thread has it's own "local" variables, so if two
> threads have access to the same array and they both call array.max...
> any local vars used in the #max method won't be "stepped on" by the
> other thread. As long as #max doesn't change any variables outside its
> method scope, it will be thread safe?
>
> Thanks,
> Erwin

Rinda::TupleSpace is just an array of arrays with few sophisticated functions
that let you find tuples (arrays) with one element matching.

if you want thread safety write your own monitors or just use mutexes

also you can use already thread safe classes like Queue

greets

--
Marcin Raczkowski
---
Friends teach what you should know
Enemies Teach what you have to know

Brian Candler

5/31/2007 10:52:00 AM

On Thu, May 31, 2007 at 05:31:53PM +0900, Erwin Abbott wrote:
> >Well, that's the general locking problem in a nutshell, and many books have
> >been written about this :-)
>
> I'm very interested in a book on programming with threads, any
> recommendations? I'll probably browse the bookstore this weekend. I'm
> afraid everything will be focused on specifics like language/OS rather
> than general concepts. It's also hard to know what's good from reading
> reviews on Amazon.

I have a good book on principles of operating systems which talks a lot
about synchronisation; unfortunately it's in a different location right now
and I can't remember the exact title. It's pretty old.

> The reason for using synchronization in #search is because we might
> start in the middle of a #write call from another thread, right? I had
> something in mind like an SQL database which would search using the
> "not-yet-written" version of the data... but now I think I see that's
> quite complicated.

No, that's quite doable as well. What you need to do is to protect the part
where you replace @data with new_data, to ensure there are no readers of
@data while you're changing it.

To do this you need a shared lock for the readers, and for the writer to
gain an exclusive lock at the point where it wants to swap the data. While
any shared locks are held, the exclusive lock will not be granted; and while
an exclusive lock is held no shared or exclusive locks will be granted.

Have a look at sync.rb in the standard Ruby distribution, although I've not
used it in anger myself. Something like this might do the trick:

require 'sync'

class DatabaseServer

def initialize
@sync = Sync.new
@data = {}
end

def search(k)
@sync.synchronize(Sync::SH) {
puts "Reading..."
sleep 1
puts "Read done"
@data[k]
}
end

def update(k,v)
@sync.synchronize(Sync::EX) {
puts "Updating..."
@data[k] = v
sleep 2
puts "Update done"
}
end
end

d = DatabaseServer.new

t1 = Thread.new {
sleep 0.5
d.update("foo","bar")
}
t2 = Thread.new {
puts "t2(a): #{d.search("foo")}"
puts "t2(b): #{d.search("foo")}"
}
t1.join
t2.join

Note that this is an artificial example, since reading an element from a
hash or writing an element to a hash are 'atomic' operations as far as the
Ruby thread scheduler is concerned. That is, in this simplistic case,
everything will work fine without explicit synchronisation. But as soon as
your searches or updates become multi-step operations (e.g. a search
involves a lookup in index I followed by a read of record R) then you need
this, so that for any particular operation, the index and the records are in
a consistent state throughout. The same applies for write-after-read
operations, such as @id=@id+1.

You'll also need to be careful about what you return to your clients. Don't
return a reference to some object which may be mutated later; use 'dup' to
make a copy for the client if necessary.

> I also think I just came to a realization. In the threading examples
> I've seen, there's typically a "sleep 5" as a placeholder for some big
> computation. But this simulates an I/O bound process instead of
> something computationally intesive. So processing 10 items each with
> their own thread (which just sleeps for 5 seconds) will finish in
> close to 5 seconds... a huge improvement over processing them
> serially. But processing the items was actually CPU intensive, would
> it be more like 50 seconds?

Yes.

However, using sleep also helps you simulate race conditions, where thread 1
does X followed by Y, and you want to see what happens if thread 2 does Z in
between X and Y. That's what I'm using it for above. You can use this to
demonstrate that two or more threads can successfully obtain a shared lock
at the same time, for instance.

BTW there's a good introductory chapter in Programming Ruby:
http://www.rubycentral.com/book/tut_th...

And you might want to look at 'madeleine', which is an in-RAM object
database written in Ruby.

Regards,

Brian.

Brian Candler

5/31/2007 11:07:00 AM

On Thu, May 31, 2007 at 11:51:27AM +0100, Brian Candler wrote:
> > The reason for using synchronization in #search is because we might
> > start in the middle of a #write call from another thread, right? I had
> > something in mind like an SQL database which would search using the
> > "not-yet-written" version of the data... but now I think I see that's
> > quite complicated.
>
> No, that's quite doable as well. What you need to do is to protect the part
> where you replace @data with new_data, to ensure there are no readers of
> @data while you're changing it.

Hmm, the code I gave didn't actually do this :-) But it did allow multiple
readers while there are no changes taking place.

Considering alternatives reveals some of the pitfalls. For example, you can
conceive of a strategy where the update thread does the following:

1. grabs a shared read lock
2. makes a local copy of the data and modifies it
3. upgrades its lock to exclusive
4. replaces the stored data with its modified copy
5. drops its lock

However this fails totally if two update threads come along at the same
time. They both successfully grab the read lock, both make their local
modifications, and then both hang forever waiting to get an exclusive lock,
which they can't do while the other still holds a read lock. This is known
as a "deadlock" scenario.

This is where a good book comes in. There are some relevant articles on
wikipedia. Googling for "philosophers" and "spaghetti" may also be helpful
:-)

Regards,

Brian.

Erwin Abbott

6/1/2007 9:43:00 AM

Marcin and Brian,

Thank you for the help, I feel like I have a pretty good starting
place now to figure the details out myself. I'm still not sure what
the differences between all the data structures like Mutex, Condition
Variable, Semaphore, Monitor, etc are... but I can work that out.

On 5/31/07, Brian Candler <B.Candler@pobox.com> wrote:
> This is where a good book comes in. There are some relevant articles on
> wikipedia. Googling for "philosophers" and "spaghetti" may also be helpful :-)

Yes, very helpful. It also led me to:
- Sleeping barber problem
- Cigarette smokers problem
- Dining cryptographers protocol

The Concurrency category on Wikipedia should give me plenty to read,
or at least know what to look for at the bookstore...
http://en.wikipedia.org/wiki/Category:C...

Thanks again,
Erwin

Marcin Raczkowski

6/1/2007 10:53:00 AM

On Friday 01 June 2007 09:42, Erwin Abbott wrote:
> Marcin and Brian,
>
> Thank you for the help, I feel like I have a pretty good starting
> place now to figure the details out myself. I'm still not sure what
> the differences between all the data structures like Mutex, Condition
> Variable, Semaphore, Monitor, etc are... but I can work that out.

Mutex - is block that can't be run concurently
Semaphore - is like "stop" sign - execution will stop untill semaphore is
lifted in another thread
Monitor - is class that throught specially made accessors allows only
synchronous access to it's methods/variables (Queue class in standard lib is
good example)
Condition Variable is something oposite to Semaphore

>
> On 5/31/07, Brian Candler <B.Candler@pobox.com> wrote:
> > This is where a good book comes in. There are some relevant articles on
> > wikipedia. Googling for "philosophers" and "spaghetti" may also be
> > helpful :-)
>
> Yes, very helpful. It also led me to:
> - Sleeping barber problem
> - Cigarette smokers problem
> - Dining cryptographers protocol
all these algorithms are very complicated in c - but in ruby are usually
really easy - you only need to use thread safe classes (like Queue) or
Monitor mixin

for example in ruby solving Writers and Readers

require 'thread'

bookstore = Queue.new

writers = 4
readers = 5

writers.times do |w|
Thread.new {
5.times { |x|
bookstore.put("new book "+x.to_s+" by "+w)
sleep 1+rand(5)
}
}
end

readers.times do
Thread.new {
4.times {
puts bookstore.pop
}
}
end

>
> The Concurrency category on Wikipedia should give me plenty to read,
> or at least know what to look for at the bookstore...
> http://en.wikipedia.org/wiki/Category:C...
>

I'm not sure what's the most popular book about algorithmics in you country -
but try spending some money or go to library ^^

> Thanks again,
> Erwin

--
Marcin Raczkowski
---
Friends teach what you should know
Enemies Teach what you have to know

comp.lang.ruby

DRb and Thread safety

Erwin Abbott

Eric Hodel

Brian Candler

Erwin Abbott

Erwin Abbott

Marcin Raczkowski

Brian Candler

Brian Candler

Erwin Abbott

Marcin Raczkowski

x Login to ForumsZone