2008/4/17 Harri Päiväniemi <harri.paivaniemi@xxxxxxxxxxxxxxx>: > > The 2nd problem that still exists is: > > When node a and b are running and everything is ok. I stop node b's > cluster daemons. when I start node b again, this situation stays > forever: > > ---------------- > node a - clustat > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > areenasql1 1 Online, Local, rgmanager > areenasql2 2 Offline > /dev/sda 0 Online, Quorum Disk > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > service:areena areenasql1 started > > ------------------- > > node b - clustat > > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > areenasql1 1 Online, rgmanager > areenasql2 2 Online, Local, rgmanager > /dev/sda 0 Offline, Quorum Disk > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > service:areena areenasql1 started > > > So node b's quorum disk is offline, log says it's registred ok and > heuristic is UP... node a sees node b as offline. If I reboot node b, it > works ok and joins ok... Now that you have mentioned it - I remember stumbling upon the similar problem. It happens if you restart the cluster services before the cluster realizes the node is dead. I guess it is a bug since the node is in some sort of limbo state at that moment reporting itsefl being part of the cluster while the cluster does not recognize it as a member. If you wait 70 seconds ( cluster.conf: <totem token="70000"/> ) before starting the cluster services then it will come up fine. The reboot works for you because it take longer than 70 sec (correct me if I am wrong). So try stopping node b cluster services, wait 70 secs and then start them back up. -Alex -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster