[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Ruby IPC In an OpenMosix Cluster

the.liberal.media

3/22/2006 10:12:00 PM

I'm undertaking a project that will eventually become a processing
pipeline application of sorts. It will receive data in a common
format, transform it into one of _many_ other formats, compile and send
it off to various endpoints (sounds spammish, but it's all solicited,
really :).

My team and I have tentatively decided on openMosix to provide an
easily scalable cluster, and Ruby for the application itself. I'm very
new to Ruby, and fairly new to IPC concepts -- The Little Book of
Semaphores and the many threads I've read here have helped me out a
_lot_.

Our application will rely on 3 sets of consumer process pools; each
pool is spawned by a daemon responsible for each basic operation (think
MTA): receive, transform, compile/send. By using processes instead of
threads we allow openMosix to migrate each process and make use of the
entire cluster.

So we have the model down, but I need a bit of advise on how to most
efficiently get these processes talking. What is the best form of IPC
to use here? It seems there are tons of Ruby examples on concurrency
and communication between threads, but I can't seem to find anything
definitive on IPC (to more than one child at least). I tried the sysv
extension off RAA, but couldn't get it to compile -- though I didn't
try my best.

Things I'm considering:

- DRb
- UNIXSocket
- mkfifo
- SysV message queue (openMosix doesn't support shmem segments)
- popen (though I can't see how to do it without round robin
producing)

If anyone has any advice, please shove me in the right direction.

Thanks in advance!

Also, thanks to matz for the great language (code blocks are uber
bueno)!

Best,
Dan

7 Answers

Ara.T.Howard

3/22/2006 10:27:00 PM

0

the.liberal.media

3/23/2006 1:14:00 AM

0

Right on, Ara. Thanks for your input!

I had looked at rq very briefly, but at first glance it seemed like it
might be a hassle to maintain as we add nodes (having to
update/kill/restart the application all nodes). What's been your
experience with regards to maintenance?

Thanks,
Dan

the.liberal.media

3/23/2006 3:54:00 AM

0

Ok, so I went back and actually read through the entire rq article this
time (and noticed who wrote it -- many props Ara :).

>From what I understood, you're suggesting something like this:

1. Use dirwatch to wait for incoming data (files) on an NFS exported
dir
2. Inject jobs into rq for each incoming file
3. rq executes commands on each node that read in each file from the
NFS mount

How fast is a setup like this? I would think there would be a lot of
overhead in forking processes for each job, and even more in the
NFS/file IO. We're shooting for 100 jobs/second, starting with a
fairly small cluster and then scaling up. Each piece of data is 4-10k.

My thought was to spawn a pool of processes once, then start feeding
them the data via [unknown IPC]. Seems like that would be a faster
solution as long as openMosix is efficient in redirecting the IO across
nodes. Of course, this may be a development nightmare (learning
experience), since neither my team nor I have a lot of experience with
multiprocessing.

If rq would satisfy our speed requirements, then I would love to avoid
the extra development time. Perhaps we'll just have to build a basic
prototype and run some tests. :)

Best,
Dan

Ara.T.Howard

3/23/2006 6:40:00 AM

0

Ara.T.Howard

3/23/2006 6:55:00 AM

0

the.liberal.media

3/23/2006 7:08:00 PM

0

> hmmm. not quite clear on what you are asking - but we regularly add and
> remove nodes. you don't need to stop all nodes to do this at all - to add a
> node simply start a feeder on it, to remove a node simply stop that nodes
> feeder.

> is that what you are asking?

No, actually I really just spat out the wrong question before I read
the entire article and understood the rq setup. :)

Dan

the.liberal.media

3/23/2006 7:15:00 PM

0

> yup this would defintely push the limits __unless__ you can batch them.

Not sure if this is an option yet.

> let me know if you go this route as i have upgrades to both dirwatch and rq
> that you'll surely want.

I will definitely let you know. I'm going to experiment with a more
generic drb setup first, and see how that goes. Do your updates
contain the stdin/stdout support, or is that already in the newest
release?

Thanks again for all your help.

Dan