David Teigland wrote: > On Wed, Mar 22, 2006 at 08:59:17AM +0100, Alain Moulle wrote: > >>>>You might set a fencing delay that would allow the dump to complete, e.g. >>>> <fence_daemon post_fail_delay="10"> >>>> </fence_daemon> > > >>OK but does that mean that one we have patched this, the peer node will >>wait in all cases this delay before fencing the node with problem, even >>if this node is not dumping , right ? > > > When fenced goes to fence a failed node, it waits 10s before actually > killing it. That applies to all nodes that fail. > > >>So, the workaround that you propose is to be used only this way : >>1. a node has crashed and was about to dump but has been fenced. >>2. patch the post_fail_delay >>3. re-start CS4 on both nodes >>4. wait for a new crash and dump, and in this case, the failover >> will take at least the post_fail_delay value. > > > I'm not sure what you mean by this, but it doesn't sound right. > post_fail_delay would be added permanently to cluster.conf which > is the same on all nodes... you don't change it. > > Dave > > Yes, that's what I have understood, and as dump can take let's say 20mn, that means that I'll have to put <fence_daemon post_fail_delay="1200"> but only in case of real problem, to let the failed node ending its dump. But we can envisage this only if we already have had a system crash, because of the long time to failover, otherwise we must keep 10s for fence delay, that's why I propose the list above. Right ? Alain -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster