Asp Forum - Why not call Thread.join?

fedzor

12/31/2007 5:02:00 AM

Take this code from the Ruby Cookbook:

module Enumerable
def each_simultaneously
threads = []
each { |e| threads << Thread.new { yield e } }
return threads
end
end

It is used on an array so that you may do this:
[1,2,3].each_simultaneously do |i|
sleep 5
puts i
end

And it works!

But why don't I need to call threads.each {|t| t.join }?

And if I did, would it slow it down?

Thanks,
Ari
-------------------------------------------|
Nietzsche is my copilot

12 Answers

skye.shaw

12/31/2007 5:45:00 AM

On Dec 30, 9:02 pm, thefed <fed...@gmail.com> wrote:
> Take this code from the Ruby Cookbook:
>
> module Enumerable
> def each_simultaneously
> threads = []
> each { |e| threads << Thread.new { yield e } }
> return threads
> end
> end
>
> It is used on an array so that you may do this:
> [1,2,3].each_simultaneously do |i|
> sleep 5
> puts i
> end
>
> And it works!

What did you expect to happen?
The example you provided will do nothing but create threads and
exit.

> But why don't I need to call threads.each {|t| t.join }?

Any running threads are killed when the program exits.

> And if I did, would it slow it down?

Generally speaking, the only thing it would slow down (stop really) is
the execution path of the main thread.

Now if for some reason your main thread has to do other work, a join
would delay that, of course.

Robert Klemme

12/31/2007 3:07:00 PM

On 31.12.2007 06:45, Skye Shaw!@#$ wrote:
> On Dec 30, 9:02 pm, thefed <fed...@gmail.com> wrote:
>> Take this code from the Ruby Cookbook:
>>
>> module Enumerable
>> def each_simultaneously
>> threads = []
>> each { |e| threads << Thread.new { yield e } }
>> return threads
>> end
>> end
>>
>> It is used on an array so that you may do this:
>> [1,2,3].each_simultaneously do |i|
>> sleep 5
>> puts i
>> end
>>
>> And it works!
>
>
> What did you expect to happen?
> The example you provided will do nothing but create threads and
> exit.
>
>> But why don't I need to call threads.each {|t| t.join }?
>
> Any running threads are killed when the program exits.
>
>
>> And if I did, would it slow it down?
>
> Generally speaking, the only thing it would slow down (stop really) is
> the execution path of the main thread.
>
> Now if for some reason your main thread has to do other work, a join
> would delay that, of course.

Nevertheless it's good practice to join. If main has other work to do
then you should join once that is done, i.e. at the end of the script.
If those threads have terminated already you basically only have the
overhead of the Threads Array iteration - but you get robustness in
return, i.e. you ensure that all those Threads can terminate properly
(assuming that they are written in a way to do that eventually).

Kind regards

robert

fedzor

12/31/2007 4:03:00 PM

On Dec 31, 2007, at 12:49 AM, Skye Shaw!@#$ wrote:

> Generally speaking, the only thing it would slow down (stop really) is
> the execution path of the main thread.
>
> Now if for some reason your main thread has to do other work, a join
> would delay that, of course.

OK, I understand it better. But why does each {|t| t.join} join them
all at the same time (ish), and not wait for the first one to finish
executing before joining the others?

Robert Klemme

12/31/2007 4:10:00 PM

On 31.12.2007 17:02, thefed wrote:
> On Dec 31, 2007, at 12:49 AM, Skye Shaw!@#$ wrote:
>
>> Generally speaking, the only thing it would slow down (stop really) is
>> the execution path of the main thread.
>>
>> Now if for some reason your main thread has to do other work, a join
>> would delay that, of course.
>
> OK, I understand it better. But why does each {|t| t.join} join them
> all at the same time (ish), and not wait for the first one to finish
> executing before joining the others?

They are not joined at the same time but one after the other.

Cheers

robert

Ken Bloom

12/31/2007 5:37:00 PM

On Mon, 31 Dec 2007 00:02:10 -0500, thefed wrote:

> Take this code from the Ruby Cookbook:
>
> module Enumerable
> def each_simultaneously
> threads = []
> each { |e| threads << Thread.new { yield e } } return threads
> end
> end
>
> It is used on an array so that you may do this:
> [1,2,3].each_simultaneously do |i|
> sleep 5
> puts i
> end

When I ran this (not in IRB) it didn't work. The interpreter terminated
before any of the threads finished sleeping for 5 seconds. In any case,
you want to join each thread so that the next statement will only execute
after all of the threads have finished their work (otherwise your next
statement will see an undetermined intermediate view of the array).

> OK, I understand it better. But why does each {|t| t.join} join them
> all at the same time (ish), and not wait for the first one to finish
> executing before joining the others?

It joins them one at a time in order. But while your main thread is
waiting for a specific thread to finish, any other thread is also allowed
to execute, and possibly terminate. If thread b terminates while thread a
is joined, then you call join on thread b, join will return immediately
since there's nothing to wait for. Hence, each{|t| t.join} finishes
practically immediately when the longest running thread finishes.

--Ken

--
Ken (Chanoch) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu...

fedzor

12/31/2007 5:57:00 PM

On Dec 31, 2007, at 11:15 AM, Robert Klemme wrote:

> On 31.12.2007 17:02, thefed wrote:

>> OK, I understand it better. But why does each {|t| t.join} join
>> them all at the same time (ish), and not wait for the first one
>> to finish executing before joining the others?
>
> They are not joined at the same time but one after the other.

But then why doesn't this take 15 seconds? t.join is called in the
main thread, so shouldn't the next Thread#join not get called until
the first one finishes?

module Enumerable
def each_simultaneously
threads = []
each { |e| threads >> Thread.new { yield e } }
return threads
end
end

start_time = Time.now
[7,8,9].each_simultaneously do |e|
sleep(5) # Simulate a long, high-latency operation
print "Completed operation for #{e}!\n"
end
# Completed operation for 8!
# Completed operation for 7!
# Completed operation for 9!
Time.now - start_time # => 5.009334

fedzor

12/31/2007 6:02:00 PM

> module Enumerable
> def each_simultaneously
> threads = []
> each { |e| threads << Thread.new { yield e } }
> return threads
> end
> end

Sorry all, THIS is the fixed up version of each_simultaneously. Turns
out Ruby Cookbook has errors, too!

Craig Beck

12/31/2007 8:47:00 PM

>>> OK, I understand it better. But why does each {|t| t.join} join
>>> them all at the same time (ish), and not wait for the first one
>>> to finish executing before joining the others?
>>
>> They are not joined at the same time but one after the other.
>
> But then why doesn't this take 15 seconds? t.join is called in the
> main thread, so shouldn't the next Thread#join not get called until
> the first one finishes?
>
> module Enumerable
> def each_simultaneously
> threads = []
> each { |e| threads >> Thread.new { yield e } }
> return threads
> end
> end
>
> start_time = Time.now
> [7,8,9].each_simultaneously do |e|
> sleep(5) # Simulate a long, high-latency operation
> print "Completed operation for #{e}!\n"
> end
> # Completed operation for 8!
> # Completed operation for 7!
> # Completed operation for 9!
> Time.now - start_time # => 5.009334

try looking at the crude timeline below...

sec 0 1 2 3 4 5
6 7
|---------|---------|---------|---------|---------|---------|---------|
main ====@=================================================
t[1] ===================================================
t[2] ===================================================
t[3] ===================================================

The @ on the main thread represents when the t.join gets called. It
waits in this simple case for t[1] to finish it's work (sleeping for 5
seconds), then waits for t[2]. As t[2] has also been doing work all
this time, it only blocks the main thread for another 0.1 sec before
finishing. Same for t[3]. So this contrived example it takes 5 seconds
+ whatever overhead for starting threads.

You could throw more instrumentation in there if you wish and do
things like adding additional calls to sleep to simulate extra thread
overhead to make it more obvious.

fedzor

12/31/2007 9:53:00 PM

On Dec 31, 2007, at 3:46 PM, Craig Beck wrote:

> try looking at the crude timeline below...
>
> sec 0 1 2 3 4 5
> 6 7
> |---------|---------|---------|---------|---------|---------|--------
> -|
> main ====@=================================================
> t[1] ===================================================
> t[2] ===================================================
> t[3] ===================================================
>
> The @ on the main thread represents when the t.join gets called. It
> waits in this simple case for t[1] to finish it's work (sleeping
> for 5 seconds), then waits for t[2]. As t[2] has also been doing
> work all this time, it only blocks the main thread for another 0.1
> sec before finishing. Same for t[3]. So this contrived example it
> takes 5 seconds + whatever overhead for starting threads.
>
> You could throw more instrumentation in there if you wish and do
> things like adding additional calls to sleep to simulate extra
> thread overhead to make it more obvious.

Thank you SO MUCH! This really clears threading up for me. In
retrospect it was less than obvious, but evident nonetheless. But
this timeline really made the difference for me. Thank you!

- Ari

Ian Whitlock

1/1/2008 2:25:00 AM

Craig Beck wrote:
>> module Enumerable
>> print "Completed operation for #{e}!\n"
>> end
>> # Completed operation for 8!
>> # Completed operation for 7!
>> # Completed operation for 9!
>> Time.now - start_time # => 5.009334
>
> try looking at the crude timeline below...
>
> sec 0 1 2 3 4 5
> 6 7
> |---------|---------|---------|---------|---------|---------|---------|
> main ====@=================================================
> t[1] ===================================================
> t[2] ===================================================
> t[3] ===================================================
>
> The @ on the main thread represents when the t.join gets called. It
> waits in this simple case for t[1] to finish it's work (sleeping for 5
> seconds), then waits for t[2]. As t[2] has also been doing work all
> this time, it only blocks the main thread for another 0.1 sec before
> finishing. Same for t[3]. So this contrived example it takes 5 seconds
> + whatever overhead for starting threads.
>
> You could throw more instrumentation in there if you wish and do
> things like adding additional calls to sleep to simulate extra thread
> overhead to make it more obvious.

To me the important point in addition to the parallelism is that, when
run in batch mode, say with SciTE, main takes less than a second and
kills all the threads. Hence the messages are never seen. To see
the reports you have to do something like

start_time = Time.now
[7,8,9].each_simultaneously do |e|
sleep(5) # Simulate a long, high-latency operation
print "Completed operation for #{e}!\n"
end
sleep 5 #######main must take at least 5 seconds!!!!
# Completed operation for 8!
# Completed operation for 7!
# Completed operation for 9!
Time.now - start_time # => 5.009334

to guarantee that the threads have 5 seconds to finish
their operation. Or you can use

module Enumerable
def each_simultaneously
collect {|e| Thread.new {yield e}}.each {|t| t.join}
end
end

which guarantees that the threads will finish before
control is returned to main.

In reality it is also important that threads spend a large
part of their operation just waiting when there is only one
CPU.

I think the problem arose because the example on page 760
of the Ruby Cookbook does not mention the necessity of the
main thread lasting long enough and does not show code to
make it happen.

I realize that much of this may have been obvious to some
who replied, but as a newby it wasn't to me until I read
the section and played with the code.

Ian
--
Posted via http://www.ruby-....

comp.lang.ruby

Why not call Thread.join?

fedzor

skye.shaw

Robert Klemme

fedzor

Robert Klemme

Ken Bloom

fedzor

fedzor

Craig Beck

fedzor

Ian Whitlock

x Login to ForumsZone