Asp Forum - Recovering from failure of a Rinda ring server

Dido Sevilla

1/21/2005 6:10:00 AM

I'm in the process of designing a fault-tolerant distributed
application using Ruby, and am looking at whether Rinda will be
suitable for this purpose. I am wondering what strategies are
available for recovering from the failure of the ring server, which
seems to me like a critical single point of failure. It is possible to
run the ring server/tuplespace daemon as part of a Linux-HA heartbeat
cluster to guard against physical failure of the primary ring server,
but this requires restarting the ring server on the secondary node.
The new ring server instance running on the backup server is now
ignorant of all services that were previously registered on the old
ring server before the failure. This is unacceptable for the
distributed application. Is there a way for live services to
automagically detect failure of the ring server, and automatically
reregister themselves with it when it goes back up in that case, or
some way for a primary and backup ring server to communicate with each
other and share information about registered services transparently?

1 Answer

Eric Hodel

1/22/2005 7:52:00 AM

On 20 Jan 2005, at 22:10, Dido Sevilla wrote:

> I'm in the process of designing a fault-tolerant distributed
> application using Ruby, and am looking at whether Rinda will be
> suitable for this purpose. I am wondering what strategies are
> available for recovering from the failure of the ring server, which
> seems to me like a critical single point of failure. It is possible to
> run the ring server/tuplespace daemon as part of a Linux-HA heartbeat
> cluster to guard against physical failure of the primary ring server,
> but this requires restarting the ring server on the secondary node.
> The new ring server instance running on the backup server is now
> ignorant of all services that were previously registered on the old
> ring server before the failure.

Yup.

> This is unacceptable for the
> distributed application. Is there a way for live services to
> automagically detect failure of the ring server, and automatically
> reregister themselves with it when it goes back up in that case, or
> some way for a primary and backup ring server to communicate with each
> other and share information about registered services transparently?

1) Run more than one RingServer, and have each cross-register the
other's services. (This service doesn't have to run on the RingServer
itself, actually...)

2) The RingServer removes services automatically when a service's
renewer fails to respond. Renewers are invoked after some timeout. On
the service side, if the service's renewer is not invoked within a
timeout, you could have the service re-register itself, something like
IRC's PING/PONG handshake.

--
Eric Hodel - drbrain@segment7.net - http://se...
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

comp.lang.ruby

Recovering from failure of a Rinda ring server

Dido Sevilla

Eric Hodel

x Login to ForumsZone