Re: Power based fencing in cluster causes single point of failure that can take down a cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Josef Whiter wrote:
You can either have redundant fence devices, or look into qdisk.

Thanks for the reply. Can you explain how qdisk would solve the problem? It seems to me that the fencing device failing which simultaneously causes the cluster member to fail wouldn't be affected by qdisk.

Does qdisk have some feedback mechanism that tells the cluster that it's ok to restart the failed services on another node without fencing being successful? I can't see how that can work reliably and still prevent split brain problems.

On Tue, Jan 09, 2007 at 10:50:53AM -0800, Jonathan Biggar wrote:
If we set up a cluster and use network power switches for fencing, won't the failure of the power switch attached to a cluster member cause all services that were running on that node to fail to migrate to other cluster members?

This seems to happen to us in practice, because fencing the offline member fails due to the power switch being unavailable, so rgmanager never migrates the failed service(s) to another member.

Is there a general solution to this problem that I'm missing?

--
Jon Biggar
Levanta
jon@xxxxxxxxxxx
650-403-7252

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux