Network hiccup + power-fencing = both nodes go down (redhat cluster 4)

"Jeff Harr" <jharr@xxxxxxxxxxxx> · Tue, 17 Jan 2006 11:48:42 -0000

Hi all, it has been a while since I posted anything. 
Once again, I’d appreciate anything anyone has to say regarding this
latest issue.  Basically, we have a situation where both nodes are
suddenly unable to reach each other due to a “network hiccup”, and
they begin trying to fence each other (power fencing).  Then suddenly, the
network returns and they turn each other off.  My need: make redhat
cluster robust enough not to do this.  It could be that my configurations
are wrong, and I’m going to include them (attached).

My idea/solution: I THINK I could increase the post-fail-delay
to a higher number than 0, thus making it wait to see if things “come
back up”.  Perhaps I make 1 node wait like 2 minutes for the other
one to come up, and another node wait zero seconds.  Thus insuring that
nobody does anything at the same time?

Some small proof that the dual-reboot happened:

I know that both boxes fenced the other and “succeeded”,
and my ILO event logs show both servers being powered off.

Thanks a lot,

Jeff

Attachment:
cluster_db2.conf

Description: cluster_db2.conf
Attachment:
cluster_db1.conf

Description: cluster_db1.conf
--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster