On Tue, Mar 20, 2007 at 08:36:59PM -0400, rhurst@xxxxxxxxxxxxxxxxx wrote: > I ran a series of reboots, and this problem is totally reproducible. Should I be opening a ticket at Red Hat Support on this? > > The problem is immediate with 'service rgmanager stop', as it hangs in its sleep loop forever, even though all nodes in the cluster report that it changed its state to down. But worse than that, it also hangs all GFS I/O and the load average on all nodes start to spike (>9.00) -- I see gfs_scand in top racing away. > > It only gets fixed when I manually 'power reset' the node, then I get the 'Missed too many heartbeats' followed by fencing. Help. > > echo "RGMGR_OPTS=-d" > /etc/sysconfig/cluster and reproduce and then open a ticket with support. Its possible that it's waiting for one of your service scripts to stop and it's not returning. Also there was a bug where bash would segfault and rgmanager would just hang. Make sure you have the newest version of bash and see if the problem still reproduces. If none of the above helps definitely file a support ticket, if frontline cannot figure it out it will probably make it back to me and I'll take a look. Josef -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster