Network hiccup + power-fencing = both nodes go down (redhat cluster 4)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all, it has been a while since I posted anything.  Once again, I’d appreciate anything anyone has to say regarding this latest issue.  Basically, we have a situation where both nodes are suddenly unable to reach each other due to a “network hiccup”, and they begin trying to fence each other (power fencing).  Then suddenly, the network returns and they turn each other off.  My need: make redhat cluster robust enough not to do this.  It could be that my configurations are wrong, and I’m going to include them (attached).

 

My idea/solution: I THINK I could increase the post-fail-delay to a higher number than 0, thus making it wait to see if things “come back up”.  Perhaps I make 1 node wait like 2 minutes for the other one to come up, and another node wait zero seconds.  Thus insuring that nobody does anything at the same time?

 

Some small proof that the dual-reboot happened:

I know that both boxes fenced the other and “succeeded”, and my ILO event logs show both servers being powered off.

 

Thanks a lot,

Jeff

 

Attachment: cluster_db2.conf
Description: cluster_db2.conf

Attachment: cluster_db1.conf
Description: cluster_db1.conf

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux