Re: Cluster Shutdown - ideas?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Whatever you do, please make sure it backports to the RHEL4 packages.  This causes much woe!  I'm interested in hearing what others say about this problem.

Shawn


On Tue, Aug 12, 2008 at 6:50 AM, Christine Caulfield <ccaulfie@xxxxxxxxxx> wrote:
One thing that cman does rather badly is a full cluster shutdown. With the RHEL4 code you would shut each node down in turn using the init scripts and found that everything hung as it lost quorum when the N/2th node went down.

With RHEL5 the init script was changed to do a "cman_tool leave remove" which tells the remaining nodes to reduce quorum to allow for the missing node(s).

I don't really like either of these solutions. The RHEL4 way is obviously a nuisance, but even the RHEL5 system is wrong IMHO. A normal node shutdown should not reduce quorum. If other nodes fail while that node is down the cluster runs the risk of a split brain due to reduced quorum.

Those of you who have worked with VMS systems know that that OS has a CLUSTER_SHUTDOWN option which causes the cluster software to wait until all nodes have reached a shutdown barrier and then allows all of them to go down at the same time. We could do this with Linux, but I'm not really sure how much use it would be, mainly because the cluster software sits at a higher level in the OS than with VMS and there is a lot more for the computer to do after the cluster software has shut down. It is an option though.

The other option is simply to set a flag (either in CMAN or locally) to tell the node or the whole cluster that everyone is being shut down. There are a few ways of doing this, the simplest is to add a flag to the cman init script (basically the opposite of what happens now in RHEL5) that causes "cman_tool leave remove". But that requires the cluster software to be shut down independently of the rest of the software thus destroying the point of ordered init scripts.

So, the flag could be an environment variable that is checked by the init script perhaps (do those get passed through?), or perhaps a flag inside cman itself that changes the "leave" behaviour to either do a "leave remove" or the synchronised cluster shutdown I mentioned earlier.

Does anyone have any preferences, ideas or other options we might consider?

Chrissie

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Shawn Hood
910.670.1819 m

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux