On Tue, 2007-11-13 at 10:04 +0100, Jos Vos wrote: > On Mon, Nov 12, 2007 at 02:47:18PM -0800, Alex Kompel wrote: > > > I observed a similar problem on the test cluster. It appears the clurgmgrd > > deadlocks in some cases in groups.c:count_resource_groups(). It does not > > happen every time but it is reproducible. Surviving node calls > > rg_lock(service:mysql) @ groups.c:101 and gets stuck. The other node > > resource manager waits indefinitely for the lock: > > [...] > > > To the original poster: the surviving node clurgmgrd is "unkillable" as > > well. > > You can try to reboot the surviving node - it will release the lock and > > resource manager on the fenced node will be unblocked and start just fine. > > Unfortunately, once you reboot the node the situation may reverse (resource > > manager will hang on the rebooted node). > > Yes, rebooting ended up in some "locking war" and neither node came up > properly. I finally (a) chkconfig off all cluster subsystems and (b) > modified the cluster.conf on both nodes to turn off autostart and then > turned off both nodes (shutting down didn't work of course). Then, I > brought the nodes up in sequence, manually brought up the cluster > subsystems, manually started the cluster services and finally I > reverted (a) and (b). > > Is this problem solved in 5.1? I'm not aware of what might be causing that, unless it's the same as #338511 but in rhel5-land. Someone else might. -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster