Oh my dear Alex, It really goes that way! - I just can't believe - you are one hell of a genious. I havn't had a clue about it could be something this simple. It really works. I feel stupid. So, I was really driving grazy with this cluster ver 5 yesterday, but now it seems that both of my problems are solved: 1. unable to bring just one node up in 2-node cluster - hanging in fencing / fence failed Reason: cman was told (by RH) to be started before qdisk and this is wrong way. Qdisk have to be started first in this situation, so fence_tool is not wondering why cluster is not quorate ;) 2. restart of cluster daemons not succesfull Reason: You have to wait "token timeout" before starting again ;) Great. Thanks for all you. RH support has been thinking these problems 3 weeks now without success. -hjp -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx on behalf of Alex Kompel Sent: Fri 4/18/2008 4:10 To: linux clustering Subject: Re: Severe problems with 64-bit RHCS on RHEL5.1 2008/4/17 Harri Päiväniemi <harri.paivaniemi@xxxxxxxxxxxxxxx>: > > The 2nd problem that still exists is: > > When node a and b are running and everything is ok. I stop node b's > cluster daemons. when I start node b again, this situation stays > forever: > > ---------------- > node a - clustat > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > areenasql1 1 Online, Local, rgmanager > areenasql2 2 Offline > /dev/sda 0 Online, Quorum Disk > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > service:areena areenasql1 started > > ------------------- > > node b - clustat > > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > areenasql1 1 Online, rgmanager > areenasql2 2 Online, Local, rgmanager > /dev/sda 0 Offline, Quorum Disk > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > service:areena areenasql1 started > > > So node b's quorum disk is offline, log says it's registred ok and > heuristic is UP... node a sees node b as offline. If I reboot node b, it > works ok and joins ok... Now that you have mentioned it - I remember stumbling upon the similar problem. It happens if you restart the cluster services before the cluster realizes the node is dead. I guess it is a bug since the node is in some sort of limbo state at that moment reporting itsefl being part of the cluster while the cluster does not recognize it as a member. If you wait 70 seconds ( cluster.conf: <totem token="70000"/> ) before starting the cluster services then it will come up fine. The reboot works for you because it take longer than 70 sec (correct me if I am wrong). So try stopping node b cluster services, wait 70 secs and then start them back up. -Alex -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster
<<winmail.dat>>
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster