Re: Network hiccup + power-fencing = both nodes godown(redhat cluster 4)

Patrick Caulfield <pcaulfie@xxxxxxxxxx> · Tue, 17 Jan 2006 14:03:23 +0000

Jeff Harr wrote:
> Thanks Patrick.  I have upped my deadnode_timeouts to 120 each.  
> 
> My worry though is the box somehow rebooting and joining faster than the
> other can wait its 120 seconds and take over the cluster.  Is there
> another timeout value that I can tweak to keep the original, crashed
> node from rebooting and joining too quickly?  Unfortunately, when the
> boxes crash they seem to come right back up and not stay dead.  I think
> this might be ILO behavior, but not sure.  I know when I shutdown -hy
> now, they stay down, and when the power-fencing takes place they stay
> down too, but not for crashes.
> 

If the crashed node tries to join while the other node thinks it's still in
the cluster then it will get rejected and its join should fail. Of course the
other node will still think it's alive but won't be able to talk to it because
it doesn't have any services running.

When the remaining node notices it has gone then it should fence it (and cause
another power cycle!). So things should be OK.

Are you seeing actual problems ?
-- 

patrick

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster