Eric Hodel
8/11/2007 1:25:00 AM
On Aug 9, 2007, at 23:48, Daniel Azuma wrote:
> I believe I've found a race condition in Rinda. But it seems a little
> "too easy", so I wonder if I'm missing something. I'd like a second
> opinion from someone more familiar with the implementation before I
> file
> a bug report.
>
> The issue is in Rinda::TupleSpace#move (and also #write and #read).
> Each
> of those methods calls Rinda::TupleSpace#start_keeper to start the
> "keeper" thread that checks periodically for expired tuples.
> Unfortunately, I believe the thread is started too early. As a
> consequence, it can terminate prematurely and leave the TupleSpace
> with
> expiring tuples and no thread to clear them out. (e.g. calls fail to
> time out properly, etc.)
>
> Here is the sequence of events, based on ruby 1.8.6p0 (but it seems
> that
> trunk has not changed since then).
>
> ts = Rinda::TupleSpace.new(1) # Check interval of 1 second
> # Don't write anything into the TupleSpace before...
> tup = ts.take([:foo], 1) # <- hangs instead of timing out
>
> Why? In tuplespace.rb, line 442 (TupleSpace#move), start_keeper is
> called. It starts the thread (line 568). If the thread is scheduled
> immediately, it will immediately check need_keeper? (line 579) and,
> discovering that the TupleSpace is empty, it promptly terminates.
> Later,
> the main thread pushes the WaitTemplateEntry (with the 1 second
> timeout)
> onto @take_waiter (line 454), expecting that the keeper thread will
> collect it after the entry expires. But the keeper thread is already
> gone. Thus, the wait template never dies, and the take call never
> returns-- at least until someone else comes in and starts another
> keeper
> thread.
>
> Shouldn't start_keeper be called AFTER pushing the wait template onto
> @take_waiter? Similarly, in #read, shouldn't line 479 come AFTER line
> 486, and similarly also in #write? Am I missing something?
I think you are right. I have CC'd Masatoshi SEKI.
> Furthermore, it doesn't seem that start_keeper should care about
> remaining outside the synchronize sections in those methods. In
> fact, if
> anything, maybe it should be INSIDE the monitor lock to prevent
> interleaved calls to start_keeper (and resultant spawning of multiple
> keeper threads).
At worst, a separate synchronize block could be put in start_keeper.
--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars