Re: shutdown (OS) of all GFS nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I see things like this often if the gfs volume isn't unmounted before attempting to shut down the cluster daemons.

On Feb 13, 2010 12:52 AM, "Brett Cave" <brettcave@xxxxxxxxx> wrote:

hi,

I have a GFS cluster (4 node + qdisk on SAN), and have problems shutting down cman service / unmount gfs mountpoints - it causes the shutdown to hang.  I am running GFS & CLVM (lv's are xen guest drives). If i try and shut down cman service manually, i get an error that resources are still in use.   1 gfs directory is exported via NFS.

I think it may be because of service stop order, specifically openais stopping before cman - could this be a valid reason?

Init6 levels are:
K00xendomains
K01xend
K03libvirtd
K20nfs
K20openais
K74gfs
K74gfs2
K76clvmd
K78qdiskd
K79cman
K86nfslock


If I manually run through the stopping of these services, gfs service hangs. This is the log:
Feb 13 10:35:25 vmhost-01 gfs_controld[3227]: cluster is down, exiting
Feb 13 10:35:25 vmhost-01 dlm_controld[3221]: cluster is down, exiting
Feb 13 10:35:25 vmhost-01 fenced[3215]: cluster is down, exiting
Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 4
Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 3
Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 2
Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 1
Feb 13 10:35:27 vmhost-01 qdiskd[3201]: <err> cman_dispatch: Host is down
Feb 13 10:35:27 vmhost-01 qdiskd[3201]: <err> Halting qdisk operations
Feb 13 10:35:51 vmhost-01 ccsd[3165]: Unable to connect to cluster infrastructure after 30 seconds.
Feb 13 10:36:13 vmhost-01 mountd[3927]: Caught signal 15, un-registering and exiting.
Feb 13 10:36:13 vmhost-01 kernel: nfsd: last server has exited
Feb 13 10:36:13 vmhost-01 kernel: nfsd: unexporting all filesystems
Feb 13 10:36:21 vmhost-01 ccsd[3165]: Unable to connect to cluster infrastructure after 60 seconds.

ccsd continues to repeat the last message, increasing time: 60s, 90s, 120s, 180s, 210s, etc

dmesg shows:
dlm: closing connection to node 4
dlm: closing connection to node 3
dlm: closing connection to node 2
dlm: closing connection to node 1

There are no open  files on GFS (from lsof)

I am using gfs (1).

The only workaround I have now is to reset the nodes via ILO once the shutdown process starts (and hangs on either gfs or cman service stop).



--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux