Re: Clusterbehaviour if one node is not reachable & fenceable any longer?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 29/01/14 12:42 PM, Nicolas Kukolja wrote:
Digimer <lists <at> alteeve.ca> writes:


99% of the time, I agree totally. Logs and configs are super helpful. In
this case though, I am pretty sure I know exactly what's happening. :)

digimer

Thanks for the explanation, digimer. You got exactly what I mean an what
happens.  Unfortunately, that was, what I was afraid of...

The three nodes in my scenario are located about 200km from each other.
If one of the nodes with all infrastructure around it (PDUs, Switches,
IPMI...) is not reachable any longer because of a power outage or a full
network outage at this location, switching a PDU is not possible, too...

That would mean, that in this (very probably) case, the cluster will not
help me?

Do you have any suggestions, what I can do to workaround this case?

Kind regards,
Nicolas

And this is the fundamental problem of stretch/geo-clusters.

I am loath to recommend this, because it's soooo easy to screw it up in the heat of the moment, so please only ever do this after you are 100% sure the other node is dead;

If you log into the 2 remaining nodes that are blocked (because of the inability to fence), you can type 'fence_ack_manual'. That will tell the cluster that you have manually confirmed the lost node is powered off.

Again, USE THIS VERY CAREFULLY!

It's tempting to make assumptions when you've got users and managers yelling at you to get services back up. So much so that Red Hat dropped 'fence_manual' entirely in RHEL 6 because it was too easy to blow things up. I can not stress it enough just how critical it is that you confirm that the remote location is truly off before doing this. If it's still on and you clear the fence action, then really bad things could happen when the link returns.

digimer

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux