Re: rhel6 node start causes power on of the other one

"Fabio M. Di Nitto" <fdinitto@xxxxxxxxxx> · Tue, 22 Mar 2011 20:05:02 +0100

On 03/22/2011 03:54 PM, Gianluca Cecchi wrote:
> On Tue, 22 Mar 2011 11:47:58 +0100, Fabio M. Di Nitto wrote:
>> For RHEL related questions you should always file a ticket with GSS.
> 
> yes, it is my usual behaviour, but tipically I prefer to analyze in
> advance and know if  a problem I'm encountering is a bug or only my
> fault in docs understanding...

GSS should be able to help you with that too :)

> 
>> This is expected behavior.
>> The node that is booting/powering on, will gain quorum by itself and
>> since it does not detect the other node for N amount of seconds, it will
>> perform a fencing action to make sure the node is not accessing any
>> shared resource.
> 
> I have been using clusters with rh el 4.x (x=6 and 8) without quorum
> disks and clusters with rh el 5.y (y=3,4,5) with quorum disks and
> tested also Fedora 13 cluster
> All of them were two nodes clusters and I don't remember this behaviour.

The behaviour is the same. RHEL6 is derived from F13 code base.

> I limit my example to 2-nodes cluster.
> I thought that the sequence when both nodes are down and one starts was:
> a) Fence daemon notices that the other node is down
> (with status option of the fence command)
> b) Fence daemon waits for the configured amount of time, based on
> cluster.conf values or default ones, to "see" the other node coming up
> c) after this amount of time fenced completes its start phase and the
> other ones take place
> In particular if quorum disk is defined (and votes expected = 2), when
> the node becomes the master for the quorum disk, the cluster is formed
> and services started
> Without any power-on of the other node at all....

With qdiskd things change a bit and I don't recall exactly the details.

> 
>> I am not sure why you want a one node cluster,
> 
> This is intended only for mainteneance windows where for example I
> could prefer to:
> - power off both nodes
> - startup and update the first one (so that the second remains
> unchanged as a rollback path)
> - test/verify it
> - let it start alone and be an active one-node cluster (eventually with quorum)
> - update the second node and let it join the cluster again

There is a specific document that explains how to upgrade cluster nodes.

Quick and dirty:

1) stop cluster on one of the node (leaving the other active with services)
2) upgrade the node
3) reboot the node

at this point, the new upgraded node will join the cluster just fine (we
do test/support this scenario)

4) migrate one service at a time from the old node, test and see if it
works.

once migration is completed and your system verified, then upgrade the
remaining node

In an HA environment you just reduced your downtime a lot by following
the correct procedure, rather than shutting down both nodes, and you
still have achieved to keep one node with new and one with old software
for rollback, for the time you needed to test the new upgraded node,
with the minimal service downtime of migration.

Fabio

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster