Saji N. Hameed
12/13/2008 12:39:00 PM
Hi Deepak,
As others mentioned, an adaptation of Google Map-Reduce technique
may be of use. To this end, you could you Ruby's Linda. For my needs
I wrote a small script that puts work descriptions on a tuple space.
This is taken up by one or more workers in parallel.
If you write distinct messages that are recognized by workers, you
could probably achieve your parallelism in a few lines without extra
libraries, perhaps except for DRBfire.
I attach it here (i am a novice Ruby programmer, the code may not
be optimal) - hope it helps.
saji
--queue code
require 'thread'
require 'sequel'
require 'rinda/tuplespace'
require 'drb'
ts = Rinda::TupleSpace.new
DRb.start_service("druby://:3130",ts)
puts "Drb server running at #{DRb.uri}"
dbname="sqlite://testQ.db"
db = Sequel.connect(dbname)
pause = 15
loop do
th1 = Thread.new do
job = db[:jobs].filter(:status => "queued").first
submit = job.merge(:status => "submitted")
ts.write [:q1, submit]
db[:jobs].filter(job).update(submit)
end
th2 = Thread.new do
result = ts.take [:rq1,nil,nil]
unless result[1]==nil
p "processing images"
p "finished image processing"
p "update job status in database"
db[:jobs].filter(result[1]).update(:status => "finished")
end
end
sleep(pause)
end
th1.join
th2.join
# connect to database
# create tuplespace
# thread1
# - collect from database
# - put on tuple
# - update db
# thread2
# - check tuple
# - download data
# - update db
---worker code
require 'drb'
require 'rinda/rinda'
DRb.start_service
ro = DRbObject.new_with_uri('druby://localhost:3130')
ts = Rinda::TupleSpaceProxy.new(ro)
def make_mme(job)
"This will be passed to AFS Server: don't worry yet"
p job
end
job = ts.take([:q1,nil])
msg = make_mme(job[1])
ts.write [:rq1,job,0] # write return status to tuplespace
DRb.thread.join
# worker takes job from tuple space (ts.take[:q1,..])
# executes job (make_mme)
# writes message on tuple space (ts.write[:rq1,..])
* Deepak Gole <deepak.gole8@gmail.com> [2008-12-12 22:58:58 +0900]:
> Hi
>
> My requirement is as follows
>
> 1) I have around 200 feeds in the database that I need to parse (fetch
> contents) *parallely* after some interval and store those feed items in
> database.
>
> 2) Now I am using backgroundrb with 10 workers each worker has assigned a
> job to parse data from 20 feeds (e.g 1st worker will fecth data from
> feeds(1..20), 2nd from feeds(21..30) ..etc.....
>
> 3) But backgroundrb is not reliable and it fails after some time. So I have
> tried Starling & Workling but those worker doesn't run *parallely.
>
> ( I need to run **parallely because those feeds will increase say 1000
> feeds. So I can't run them sequentially. ) *
> *
> Pls I need a help on above problem.*
>
>
> Thanks
> Deepak
--
Saji N. Hameed
APEC Climate Center +82 51 668 7470
National Pension Corporation Busan Building 12F
Yeonsan 2-dong, Yeonje-gu, BUSAN 611705 saji@apcc21.net
KOREA