On Fri, 2005-09-02 at 08:03 +0100, Patrick Caulfield wrote: > Lon Hohberger wrote: > > On Thu, 2005-09-01 at 21:58 +0200, Jure Pečar wrote: > > > >>Hi all, > >> > >>Sorry if this is somewhat offtopic here ... > >> > >>Our telco is looking into linux HA solutions for their VoIP needs. Their > >>main requirement is that the failover happens in the order of a few 100ms. > >> > >>Can redhat cluster be tweaked to work reliably with such short time > >>periods? This would mean heartbeat on the level of few ms and status probes > >>on the level of 10ms. Is this even feasible? > > > > > > Possibly, I don't think it can do it right now. A couple of things to > > remember: > > > > * For such a fast requirement, you'll want a dedicated network for > > cluster traffic and a real-time kernel. > > > > * Also, "detection and initiation of recovery" is all the cluster > > software can do for you; your application - by itself - may take longer > > than this to recover. > > > > * It's practically impossible to guarantee completion of I/O fencing in > > this amount of time, so your application must be able to do without, or > > you need to create a new specialized fencing mechanism which is > > guaranteed to complete within a very fast time. > > > > * I *think* CMAN is currently at the whole-second granularity, so some > > changes would need to be made to give it finer granularity. This > > shouldn't be difficult (but I'll let the developers of CMAN answer this > > definitively, though... ;) ) > > > > All true :) All cman timers are calibrated in seconds. I did run some tests a > while ago with them in milliseconds and 100ms timeouts and it worked > /reasonably/ well. However, without an RT kernel I wouldn't like to put this > into a production system - we've had several instances of the cman kernel thread > (which runs at the top RT priority) being stalled for up to 5 seconds and that > node being fenced. Smaller stalls may be more common so with timeouts set that > low you may well get nodes fenced for small delays. > > To be quite honest I'm not really sure what causes these stalls, as they > generally happen under heavy IO load I assume (possibly wrongly) that they are > related to disk flushes but someone who knows the VM better may out me right on > this. > > These systems could have swap.. Swap doesn't work because it is possible for a swapped page to take 1-10 seconds to be swapped into memory. The mlockall() system call resolves this particular problem. The poll sendmsg and recvmsg (and some others that require memory) system calls can block when allocating memory in low memory conditions. This unfortunately results in longer timeouts necessary when the system is overloaded. One solution is to change these system calls via some kind of socket option to allocate memory ahead of time for their operation. But I don't know of anything like this yet. I have measured failover with openais at 3 msec from detection to direction of new CSIs within components. Application failures are detected in 100 msec. Node failures are detected in 100 msec. It is possible on a system that meets the above scenario for a processor to be excluded from the membership during low memory. This is a reasonable choice, because the processor is having difficulty responding to requests in a timely fashion, and should be removed until overload control software on the processor cleans up the processor memory. Regards -steve -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster