On Mon, 31 Aug 2009 14:22:07 -0700 Rick Stevens <ricks@xxxxxxxx> wrote: > I don't see that there's anything to fix. You had a three-node > cluster so you needed a majority of nodes up to maintain a quorum. > One node died, killing quorum and thus stopping the cluster Nope. Quorum is still there. I have 3 nodes with qdisk, and two nodes remained in quorum. Then, I had to reboot the nodes because of some multipath/scsi changes, and after that, they only try to fence the missing node, and they can't get to it's fencing device, and rgmanager is not showing in my output. Quorum is regained after both nodes restarted. So, bassically what I mean is that you cannot start cluster with one node and it's fence device missing, although you have gained quorum. 2 nodes and qdisk is much more than I need - I need only one node + qdisk for cluster to function properly. > As a three-node cluster, it's dead. > It can't be run as a three-node cluster until the third node is > fixed. Those are the rules. Well this is the part that I don't like :) Why can't I for example put 10 missing nodes in my cluster.conf - if other nodes don't gain quorum, they shouldn't start services and that's it, but if they do gain quorum, what's the point of constantly trying to fence missing fence device of missing node?! > A two node cluster requires special handling of things to prevent the > dread split-brain situation, which is what two_node does. Running the > surviving nodes as a two-node cluster is, by definition, a > reconfiguration. I'd say simply requiring you to set two_node is > pretty damned innocuous to let you run a dead (ok, mortally wounded) > cluster. > > If you pulled a drive out of a RAID6--thus degrading it to a RAID5-- > would you complain because it didn't remain a RAID6? First of all, RAID6 without one disk _IS NOT_ RAID5. In terms of redundancy they are the same, but on disk data is not the same, so that two are not equal. And yes - I would complain if I had to _REBUILD_ degraded array to RAID5. And until it's rebuilded, if the array was unavailable - that would be a major issue - what's the point of redundancy then if I loose whole array/cluster when one unit fails? But with RAID6 I don't have to. As a matter of fact, I can loose one more drive, and leave it in that state until I buy new two drives and hotplug them into the chassis. EG.: until quorum is maintained, array and data in it are not jeopardized. With RHCS that should be the same, shouldn't it? I'm just asking, why can't I leave the missing node in the configuration, which will be active once it returns from dealer? Why do I have to reconfig the cluster? That is not a good behaviour IMHO - there should be some command to mark node as missing, and the cluster should work fine with two nodes + qdisk because it has quorum. Isn't that the point of quorum? What's the point of cluster, if one node cannot malfunction, and be taken away to repairs, without the need of setting up a new cluster? In your RAID6 configuration, it's like taking away one disk breaks the array until you rebuild it... -- | Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D | ================================================================= | start fighting cancer -> http://www.worldcommunitygrid.org/ | -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster