Hi all, it has been a while since I posted anything.
Once again, I’d appreciate anything anyone has to say regarding this
latest issue. Basically, we have a situation where both nodes are
suddenly unable to reach each other due to a “network hiccup”, and
they begin trying to fence each other (power fencing). Then suddenly, the
network returns and they turn each other off. My need: make redhat
cluster robust enough not to do this. It could be that my configurations
are wrong, and I’m going to include them (attached). My idea/solution: I THINK I could increase the post-fail-delay
to a higher number than 0, thus making it wait to see if things “come
back up”. Perhaps I make 1 node wait like 2 minutes for the other
one to come up, and another node wait zero seconds. Thus insuring that
nobody does anything at the same time? Some small proof that the dual-reboot happened: I know that both boxes fenced the other and “succeeded”,
and my ILO event logs show both servers being powered off. Thanks a lot, Jeff |
Attachment:
cluster_db2.conf
Description: cluster_db2.conf
Attachment:
cluster_db1.conf
Description: cluster_db1.conf
-- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster