Jeff Harr wrote: > Thanks Patrick. I have upped my deadnode_timeouts to 120 each. > > My worry though is the box somehow rebooting and joining faster than the > other can wait its 120 seconds and take over the cluster. Is there > another timeout value that I can tweak to keep the original, crashed > node from rebooting and joining too quickly? Unfortunately, when the > boxes crash they seem to come right back up and not stay dead. I think > this might be ILO behavior, but not sure. I know when I shutdown -hy > now, they stay down, and when the power-fencing takes place they stay > down too, but not for crashes. > If the crashed node tries to join while the other node thinks it's still in the cluster then it will get rejected and its join should fail. Of course the other node will still think it's alive but won't be able to talk to it because it doesn't have any services running. When the remaining node notices it has gone then it should fence it (and cause another power cycle!). So things should be OK. Are you seeing actual problems ? -- patrick -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster