Re: partly OT: failover <500ms

Lon Hohberger <lhh@xxxxxxxxxx> · Thu, 01 Sep 2005 17:39:59 -0400

On Thu, 2005-09-01 at 21:58 +0200, Jure Pečar wrote:
> Hi all,
> 
> Sorry if this is somewhat offtopic here ...
> 
> Our telco is looking into linux HA solutions for their VoIP needs. Their
> main requirement is that the failover happens in the order of a few 100ms. 
> 
> Can redhat cluster be tweaked to work reliably with such short time
> periods? This would mean heartbeat on the level of few ms and status probes
> on the level of 10ms. Is this even feasible?

Possibly, I don't think it can do it right now.  A couple of things to
remember:

* For such a fast requirement, you'll want a dedicated network for
cluster traffic and a real-time kernel.

* Also, "detection and initiation of recovery" is all the cluster
software can do for you; your application - by itself - may take longer
than this to recover.

* It's practically impossible to guarantee completion of I/O fencing in
this amount of time, so your application must be able to do without, or
you need to create a new specialized fencing mechanism which is
guaranteed to complete within a very fast time.

* I *think* CMAN is currently at the whole-second granularity, so some
changes would need to be made to give it finer granularity.  This
shouldn't be difficult (but I'll let the developers of CMAN answer this
definitively, though... ;) )

* Clumanager 1.2.x (RHCS3) can theoretically operate at sub-second
failure detection, but not at the levels you require (also, doing so is
not tested nor supported anyway). 

-- Lon

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster