On Mon, Mar 13, 2006 at 10:31:18AM -0700, Dex Chen wrote: > Hi, > > I believe that I saw something unusual here. > > I have a 3 node cluster (with GFS) using CMAN. After I shutdown 2 nodes > in short time span, the cluster shows it lost quorum, but I run the > clustat on the third node, and clustat shows the cluster has 3 nodes (2 > are offline) and the other services are up. I was able to access/read > the share storage. CMAN_TOOL shows cluster lost quorum and the activity > is blocked. What I expected is that I should not allow accessing the > shared storage and other services at all when the cluster lost the > quorum. Anyone has seen the similar things? What/where should I look > into? Quorum is the normal method of preventing an instance of some cluster subsystem or application (a gfs mount-group, a dlm lock-space, an rgmanager service/app/resource, etc) from being enabled on both sides of a partitioned cluster. It does this by preventing the creation of new instances in inquorate clusters and by preventing recovery (re-enabling) of existing instances in inquorate clusters. There's one special case where we also rely on fencing to prevent an instance from being enabled on both sides of a split at once. It's where all the nodes using the instance before the failure/partition, also exist on the inquorate side of the split afterward. If a quorate partition then forms, the first thing it does is fence all nodes it can't talk with, which are the nodes on the inquorate side. The quorate side then enables instances of dlm/gfs/etc, the fencing having guaranteed there are none elsewhere. Apart from this, each service/instance/system responds internally to the loss of quorum in its own way. In the special case I described where all the nodes using the instance remain after the event, dlm and gfs both continue to run normally on the inquorate nodes; there's been no reason to do otherwise. I suspect what you saw is that nodes A and B failed/shutdown but weren't using any of the dlm/gfs instances that C was. C was then this special case and dlm/gfs continued to run normally. If A and B had come back and formed a partitioned, quorate cluster, they would have fenced C before enabling any dlm or gfs instances. Dave -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster