On 03/22/2011 04:41 PM, bergman@xxxxxxxxxxxx wrote: > The pithy ruminations from "Fabio M. Di Nitto" <fdinitto@xxxxxxxxxx> on "Re: rhel6 node start causes power on of the other one" were: > > => Hi, > => > => On 3/22/2011 11:12 AM, Gianluca Cecchi wrote: > > [SNIP!] > > => > > => > If the initial situation is both nodes down and I start one of them, I > => > get it powering on the other, that is not my intentional target... > > [SNIP!] > > => > => This is expected behavior. > => > > [SNIP!] > > => I am not sure why you want a one node cluster, but one easy workaround > > Sometimes, it's not a matter of "wanting" a one-node cluster, but being forced to have one temporarily. For example, if there's a hardware failure in one node of a 2-node cluster. I think that a likely scenario is that there's an event (for example, a power outage) that shuts down all nodes in a cluster, and that there is subsequent damage from that event (hardware failure, filesystem corruption on the local storage, etc.) that prevents some nodes from being restarted. If the hardware has failed or doesn't boot, fencing will still happen from the remaining node, succeed (assuming the fencing device is not gone bad too) and the node will keep working. The failed node at that point does NOT need to rejoin the cluster for the surviving node to keep working. the problem is that we need to differentiate between normal operations and special situations. In a special situation like you describe, you might have to go to the server to do hw repair, just unplug the power cords from the fencing device (assuming power fencing) and wait for the remaining node to fence and keep working. If fencing fails, then you can use fence_ack_manual to override the "wait for fencing" condition on the surviving node and allow it to operate (and make absolutely sure the bad node is really off for good or bad things will happen). > > => is to start both of them at the same time, and then shutdown one of > > If both nodes are not available, this is not an easy work-around. > > => them. At that point they have both seen each other and the one going > => down will tell the other "I am going offline, no worries, itÂs all good". > => > > What are the recommended alternative methods to starting a > single-node on a cluster? If the number of expected votes is set to the number of votes for the single node, I'm able to start a single node. However, I'm not sure what will happen if additional nodes in the cluster are started later...will there be fencing or split-brain issues if "expected votes" is "1" when there are 2 nodes in the cluster? So this area is delicate. Adding nodes to a running cluster when number of nodes is >= 2 is easy. Adding nodes from 1 to 3 is delicate. In some random tests I did, but they are not officially supported operations, i was able to start a one node cluster (with literally one node in the config) and go up to 16. Then, assuming you are on rhel6 (didn't test 5) and start with a one node cluster that's up and running: - create a config with higher version, and 2 nodes (you cannot bump from 1 to random number of nodes in one go or you will risk fencing/split brain, there are some rules related to quorum that could cause issues if not followed strictly). - copy the new config on both nodes - start cman on the node you want to add. At this point, the new node will join with config in running_version+1, triggering immediately a config reload on the active single node, that will see the new node and recalculate quorum. You should be able to repeat the same operation adding one node at a time, in some cases more, but since it's complex and delicate calculation, stick to one. > > Can additional nodes be brought up without affecting the services running on the existing node (ie., without causing the new node to fence the existing node)? Yes, in theory, but this is not a scenario we test constantly or support as full feature. Clearly, all of the above is in the assumption that the configs are correct at every stage and that there are no other problems in between (for instance a network issue or iptables misconfigured etc). Fabio -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster