Dido Sevilla
1/21/2005 6:10:00 AM
I'm in the process of designing a fault-tolerant distributed
application using Ruby, and am looking at whether Rinda will be
suitable for this purpose. I am wondering what strategies are
available for recovering from the failure of the ring server, which
seems to me like a critical single point of failure. It is possible to
run the ring server/tuplespace daemon as part of a Linux-HA heartbeat
cluster to guard against physical failure of the primary ring server,
but this requires restarting the ring server on the secondary node.
The new ring server instance running on the backup server is now
ignorant of all services that were previously registered on the old
ring server before the failure. This is unacceptable for the
distributed application. Is there a way for live services to
automagically detect failure of the ring server, and automatically
reregister themselves with it when it goes back up in that case, or
some way for a primary and backup ring server to communicate with each
other and share information about registered services transparently?