Re: GFS/CS blocks all I/O on 1 server reboot of 11 nodes?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 20, 2007 at 08:36:59PM -0400, rhurst@xxxxxxxxxxxxxxxxx wrote:
> I ran a series of reboots, and this problem is totally reproducible.  Should I be opening a ticket at Red Hat Support on this?
> 
> The problem is immediate with 'service rgmanager stop', as it hangs in its sleep loop forever, even though all nodes in the cluster report that it changed its state to down.  But worse than that, it also hangs all GFS I/O and the load average on all nodes start to spike (>9.00) -- I see gfs_scand in top racing away.
> 
> It only gets fixed when I manually 'power reset' the node, then I get the 'Missed too many heartbeats' followed by fencing.  Help.
> 
>

echo "RGMGR_OPTS=-d" > /etc/sysconfig/cluster

and reproduce and then open a ticket with support.  Its possible that it's
waiting for one of your service scripts to stop and it's not returning.  Also
there was a bug where bash would segfault and rgmanager would just hang.  Make
sure you have the newest version of bash and see if the problem still
reproduces.  If none of the above helps definitely file a support ticket, if
frontline cannot figure it out it will probably make it back to me and I'll take
a look.

Josef

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux