Re: Re: Node2 kills node1 when it is booting ...

Stewart Walters <stewart@xxxxxxxxxxxx> · Tue, 27 Jan 2009 19:26:10 +0900

carlopmart wrote:
Stewart Walters wrote:
carlopmart wrote:
carlopmart wrote:
Hi all,

 I need to setup another rhcs today with two nodes. But every times 
that I start second node, node1 returns this error:

cman killed by node 2 because we rejoined the cluster without a 
full restart

 .. and cman stops on node1. Why?? I didn't find any solution under 
http://sources.redhat.com/cluster/wiki/FAQ/

 My nodes are rhel5.3

 Many thanks.

Please, I need your help ... Any ideas???

Sounds like node1 fenced node2, and node2 hasn't been rebooted since 
being fenced. Either that, or node2 uses manual fencing and you 
haven't yet manually acknowledged that it was rebooted.

Check your logs in /var/log/messages on node1, I'm pretty sure you'll 
see a reference there that node2 has been fenced.

You'll probably also see somewhere in the logs on node1, that it 
detected node2 did not leave the cluster after being fenced, and as a 
result node1 itself has decided to stop itself to prevent data 
corruption (the message will be something like that anyway).

If you are using manual fencing on a node2, after you reboot it you 
need to run "fence_manual_ack -n <node2>" from node1.  Do this only 
after you've restarted node2 but before cman starts back up on it in 
the next boot sequence.  At this point node1 will stop fencing node2 
and both nodes should be able to join the cluster succesfully.

Manual fencing is evil :-)

Try to avoid it if you can - as you'll get this scenario on your 
cluster every time a node is fenced.  This is the reason why Red Hat 
write in their documentation numerous times that manual fencing is 
not supported in Production clusters (it's almost as if they're 
trying to tell us something...). ;-)

Also, you mentioned that the solution was not found in the FAQ.  
While it might not include reference to this specific symptoms, I'm 
pretty sure the FAQ, the man pages for fence_manual and the RHCS 
documentation from Red Hat all cover the requirements of having to 
manually acknowleging nodes that use manual fencing.  If you do in 
fact employ manual fencing in your cluster, you might want to go over 
this documentation again.

If you don't use manual fencing, please accept my apologies for 
expressing my general distaste for manual fencing instead of actually 
helping you!! :-)

Kind Regards,

Stewart

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

Many thanks for your help Stewart, but I don't use manual fence as 
fence device in this cluster. I am using gnbd to do this.

I post my cluster.conf

------------------------------------------------------------------------

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
Silly question then, have you actually restarted (i.e. actually 
rebooted) the cluster node1?

Regards,

Stewart

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster