[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

[ANN] forkoff - parallel processing for ruby enumerables

ara.t.howard

4/18/2008 1:43:00 AM


NAME

forkoff

SYNOPSIS

brain-dead simple parallel processing for ruby

URI

http://rubyforge.org/projects/cod...

INSTALL

gem install forkoff

DESCRIPTION

forkoff works for any enumerable object, iterating a code block to
run in a
child process and collecting the results. forkoff can limit the
number of
child processes which is, by default, 8.

SAMPLES

<========< samples/a.rb >========>

~ > cat samples/a.rb

#
# forkoff makes it trivial to do parallel processing with ruby,
the following
# prints out each word in a separate process
#

require 'forkoff'

%w( hey you ).forkoff!{|word| puts "#{ word } from
#{ Process.pid }"}

~ > ruby samples/a.rb

hey from 3239
you from 3240


<========< samples/b.rb >========>

~ > cat samples/b.rb

#
# for example, this takes only 1 second or so to complete
#

require 'forkoff'

a = Time.now.to_f

results =
(0..7).forkoff do |i|

sleep 1

i ** 2

end

b = Time.now.to_f

elapsed = b - a

puts "elapsed: #{ elapsed }"
puts "results: #{ results.inspect }"

~ > ruby samples/b.rb

elapsed: 1.07044386863708
results: [0, 1, 4, 9, 16, 25, 36, 49]


<========< samples/c.rb >========>

~ > cat samples/c.rb

#
# forkoff does *NOT* spawn processes in batches, waiting for each
batch to
# complete. rather, it keeps a certain number of processes busy
until all
# results have been gathered. in otherwords the following will
ensure that 2
# processes are running at all times, until the list is complete.
note that
# the following will take about 2 seconds to run (2 sets of 2 @ 1
second).
#

require 'forkoff'

pid = Process.pid

a = Time.now.to_f

pstrees =
%w( a b c d ).forkoff! :processes => 2 do |letter|
sleep 1
{ letter => ` pstree -l 2 #{ pid } ` }
end


b = Time.now.to_f

puts
puts "pid: #{ pid }"
puts "elapsed: #{ b - a }"
puts

require 'yaml'

pstrees.each do |pstree|
y pstree
end

~ > ruby samples/c.rb


pid: 3254
elapsed: 2.12998485565186

---
a: |
-+- 03254 ahoward ruby -Ilib samples/c.rb
|-+- 03255 ahoward ruby -Ilib samples/c.rb
\-+- 03256 ahoward ruby -Ilib samples/c.rb

---
b: |
-+- 03254 ahoward ruby -Ilib samples/c.rb
|-+- 03255 ahoward ruby -Ilib samples/c.rb
\-+- 03256 ahoward ruby -Ilib samples/c.rb

---
c: |
-+- 03254 ahoward ruby -Ilib samples/c.rb
|-+- 03261 ahoward (ruby)
\-+- 03262 ahoward ruby -Ilib samples/c.rb

---
d: |
-+- 03254 ahoward ruby -Ilib samples/c.rb
|-+- 03261 ahoward ruby -Ilib samples/c.rb
\-+- 03262 ahoward ruby -Ilib samples/c.rb





a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama




20 Answers

Roger Pack

4/18/2008 1:52:00 AM

0

[Note: parts of this message were removed to make it a legal post.]

>
> NAME
>
> forkoff
>

Nice. Great idea.
>
> # forkoff does *NOT* spawn processes in batches, waiting for each
> batch to
> # complete. rather, it keeps a certain number of processes busy
> until all
> # results have been gathered. in otherwords the following will
> ensure that 2
> # processes are running at all times, until the list is complete.
> note that
> # the following will take about 2 seconds to run (2 sets of 2 @ 1
> second).
> #

I assume then that at most 2 processes are forked, and each keeps
working?

ara.t.howard

4/18/2008 2:38:00 AM

0


On Apr 17, 2008, at 7:51 PM, Roger Pack wrote:
> I assume then that at most 2 processes are forked, and each keeps
> working?

right now it's 8 - 2 is more reasonable. at this point this code is
fully proof of concept - i'll take that as a suggestion (that i agree
with)

cheers.

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama




Phillip Gawlowski

4/18/2008 2:48:00 AM

0

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

ara howard wrote:
|
| NAME
|
| forkoff
|
| SYNOPSIS
|
| brain-dead simple parallel processing for ruby
|
| URI
|
| http://rubyforge.org/projects/cod...
|
| INSTALL
|
| gem install forkoff
|
| DESCRIPTION
|
| forkoff works for any enumerable object, iterating a code block to run
| in a
| child process and collecting the results. forkoff can limit the
| number of
| child processes which is, by default, 8.

So, the tool that captures run away processes and terminates them will
be called 'sodoff', I wager? :P

SCNR

- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan

You thought I was taking your woman away from you. You're jealous.
You tried to kill me with your bare hands. Would a Kelvan do that?
Would he have to? You're reacting with the emotions of a human.
You are human.
~ -- Kirk, "By Any Other Name," stardate 4657.5
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail....

iEYEARECAAYFAkgIC/AACgkQbtAgaoJTgL/V1wCeMxLdzlPEbQDtp3fya03PRP8z
O6AAn0BA5yY/MU1dzKYt1Ezd/YbsFakv
=QIF9
-----END PGP SIGNATURE-----

ara.t.howard

4/18/2008 2:54:00 AM

0


On Apr 17, 2008, at 8:48 PM, Phillip Gawlowski wrote:
>
> So, the tool that captures run away processes and terminates them will
> be called 'sodoff', I wager? :P

oh yeah, that's good - taken!


a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama




Phillip Gawlowski

4/18/2008 3:40:00 AM

0

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

ara.t.howard wrote:
|
| On Apr 17, 2008, at 8:48 PM, Phillip Gawlowski wrote:
|>
|> So, the tool that captures run away processes and terminates them will
|> be called 'sodoff', I wager? :P
|
| oh yeah, that's good - taken!

I want credit. Dollars aren't worth a dime. :P

- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan

"You speak truth," said Themistocles; "I should never have been famous
if I had been of Seriphus"
~ -- Plutarch (46-120 AD)
~ -- Life of Themistocles
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail....

iEYEARECAAYFAkgIGAMACgkQbtAgaoJTgL/40QCgpIHgsDVOKQHPfTLEWA05FwLs
73gAn0D6YYgbh0Td+nNcVf6xGMr6ZPGM
=hUj0
-----END PGP SIGNATURE-----

Piyush Ranjan

4/18/2008 4:58:00 AM

0

[Note: parts of this message were removed to make it a legal post.]

I think this is a great idea!
Kudos

On Fri, Apr 18, 2008 at 9:09 AM, Phillip Gawlowski <
cmdjackryan@googlemail.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> ara.t.howard wrote:
> |
> | On Apr 17, 2008, at 8:48 PM, Phillip Gawlowski wrote:
> |>
> |> So, the tool that captures run away processes and terminates them will
> |> be called 'sodoff', I wager? :P
> |
> | oh yeah, that's good - taken!
>
> I want credit. Dollars aren't worth a dime. :P
>
> - --
> Phillip Gawlowski
> Twitter: twitter.com/cynicalryan
>
> "You speak truth," said Themistocles; "I should never have been famous
> if I had been of Seriphus"
> ~ -- Plutarch (46-120 AD)
> ~ -- Life of Themistocles
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.8 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail....
>
> iEYEARECAAYFAkgIGAMACgkQbtAgaoJTgL/40QCgpIHgsDVOKQHPfTLEWA05FwLs
> 73gAn0D6YYgbh0Td+nNcVf6xGMr6ZPGM
> =hUj0
> -----END PGP SIGNATURE-----
>
>

Martin DeMello

4/18/2008 8:00:00 AM

0

On Thu, Apr 17, 2008 at 6:43 PM, ara howard <ara.t.howard@gmail.com> wrote:
> DESCRIPTION
>
> forkoff works for any enumerable object, iterating a code block to run in
> a
> child process and collecting the results. forkoff can limit the number of
> child processes which is, by default, 8.

Very neat indeed!

martin

fedzor

4/18/2008 12:23:00 PM

0

Since it's using Kernel#fork(), does this mean it is using OS threads?

ara.t.howard

4/18/2008 4:00:00 PM

0


On Apr 18, 2008, at 6:23 AM, fedzor wrote:
> Since it's using Kernel#fork(), does this mean it is using OS threads?

yes. forkoff has a number of consumer *green* threads used to manage
an array of queues containing the elements destined to be passed to a
forked process/native thread for execution of the block. the code is
very short, give a read.

cheers.

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama




Erik Veenstra

4/18/2008 6:19:00 PM

0

I've once implemented Enumerable#fork myself. It doesn't use
queues, or a producer-consumer like pattern. It simply tells a
generic ThreadLimiter to spawn a new thread. Within this
thread, a new process is spawned. The number of concurrent
threads, and thus the number of concurrent processes, is
controlled by ThreadLimiter.

We might learn from both implementations.

gegroet,
Erik V. - http://www.erikve...

----------------------------------------------------------------

Here's my code:

----------------------------------------------------------------

module Enumerable
def fork(max_number_of_threads=nil, &block)
thread_limiter =
EV::ThreadLimiter.new(max_number_of_threads)

collect do |x|
thread_limiter.fork do
Thread.current.abort_on_exception = true

r, w = IO.pipe

if pid = Process.fork
w.close
Process.wait(pid)
data = r.read
r.close
Marshal.load(data)
else
r.close
Marshal.dump(block.call(x), w)
w.close
exit
end
end
end.collect do |t|
t.value
end
end
end

----------------------------------------------------------------

module EV
class ThreadLimiter
def initialize(max_number_of_threads)
@number_of_threads = 0
@max_number_of_threads = max_number_of_threads

yield(self) if block_given?
end

def fork(*args, &block)
Thread.pass while @max_number_of_threads and
@max_number_of_threads > 0 and
@number_of_threads >
@max_number_of_threads

# If this methods is called from several threads, then
# @number_of_threads might get bigger than
@max_number_of_threads.
# This usually a) isn't the case and b) doesn't really matter
(to me...).
# I'm willing to accept this "risk", because a)
Thread.exclusive is
# much, much faster than Mutex#synchronize and b) we can't run
into
# deadlocks.

Thread.exclusive{@number_of_threads += 1}

Thread.fork do
begin
res = block.call(*args)
ensure
Thread.exclusive{@number_of_threads -= 1}
end

res
end
end
end
end

----------------------------------------------------------------

Here's a benchmark:

require "benchmark"

Benchmark.bm(15) do |bm|
rc = nil
r2 = nil
r4 = nil
rx = nil

data = 1..10
test = lambda{|x| 1_000_000.times{7+8}; [x, Process.pid]}

bm.report(" collect "){rc = data.collect(&test)}
bm.report(" 2 processes"){r2 = data.fork(2, &test)}
bm.report(" 4 processes"){r4 = data.fork(4, &test)}
bm.report("inf processes"){rx = data.fork(-1, &test)}

p rc
p r2
p r4
p rx
end

It produces these results on a dual core machine:

user system total real
collect 4.530000 0.000000 4.530000 ( 4.527982)
2 processes 0.030000 0.050000 3.170000 ( 1.733209)
4 processes 0.160000 0.370000 3.610000 ( 1.927826)
inf processes 0.000000 0.000000 3.080000 ( 1.691932)
[[1, 18732], [2, 18732], [3, 18732], [4, 18732], [5, 18732], [6,
18732], [7, 18732], [8, 18732], [9, 18732], [10, 18732]]
[[1, 18733], [2, 18734], [3, 18735], [4, 18736], [5, 18737], [6,
18738], [7, 18739], [8, 18740], [9, 18741], [10, 18742]]
[[1, 18743], [2, 18744], [3, 18745], [4, 18746], [5, 18747], [6,
18748], [7, 18749], [8, 18750], [9, 18751], [10, 18752]]
[[1, 18753], [2, 18754], [3, 18755], [4, 18756], [5, 18757], [6,
18758], [7, 18759], [8, 18760], [9, 18761], [10, 18762]]

----------------------------------------------------------------