Re: shutdown (OS) of all GFS nodes

Brett Cave <brettcave@xxxxxxxxx> · Mon, 15 Feb 2010 09:10:28 +0200

K74gfs = shutdown GFS service = unmount gfs volumes.

Specifically, wouldn't that openais service cause issues with cman / gfs, and having openais shut down will result in gfs volumes not being able to unmount?

i ran some service shutdowns manually over the weekend, and it seems that the following process works:
1) stop all services that have open files on gfs volumes or use clvm volumes (e.g. cutstom app referencing files, xendomains using cluster LV's)

2) service gfs stop (unmount gfs volumes)
3) service clvmd stop
4) service cman stop
5) init 6

this works without hanging. Definitely an indication that there maybe incorrect ordering of shutdown. anyone else see this happening?

On Sat, Feb 13, 2010 at 8:17 PM, Ian Hayes <cthulhucalling@xxxxxxxxx> wrote:

I see things like this often if the gfs volume isn't unmounted before attempting to shut down the cluster daemons.
On Feb 13, 2010 12:52 AM, "Brett Cave" <brettcave@xxxxxxxxx> wrote:

hi,

I have a GFS cluster (4 node + qdisk on SAN), and have problems shutting down cman service / unmount gfs mountpoints - it causes the shutdown to hang.  I am running GFS & CLVM (lv's are xen guest drives). If i try and shut down cman service manually, i get an error that resources are still in use.   1 gfs directory is exported via NFS.

I think it may be because of service stop order, specifically openais stopping before cman - could this be a valid reason?

Init6 levels are:
K00xendomains
K01xend
K03libvirtd
K20nfs
K20openais

K74gfs
K74gfs2
K76clvmd
K78qdiskd
K79cman
K86nfslock

If I manually run through the stopping of these services, gfs service hangs. This is the log:
Feb 13 10:35:25 vmhost-01 gfs_controld[3227]: cluster is down, exiting

Feb 13 10:35:25 vmhost-01 dlm_controld[3221]: cluster is down, exiting
Feb 13 10:35:25 vmhost-01 fenced[3215]: cluster is down, exiting
Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 4
Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 3

Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 2
Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 1
Feb 13 10:35:27 vmhost-01 qdiskd[3201]: <err> cman_dispatch: Host is down

Feb 13 10:35:27 vmhost-01 qdiskd[3201]: <err> Halting qdisk operations
Feb 13 10:35:51 vmhost-01 ccsd[3165]: Unable to connect to cluster infrastructure after 30 seconds.
Feb 13 10:36:13 vmhost-01 mountd[3927]: Caught signal 15, un-registering and exiting.

Feb 13 10:36:13 vmhost-01 kernel: nfsd: last server has exited
Feb 13 10:36:13 vmhost-01 kernel: nfsd: unexporting all filesystems
Feb 13 10:36:21 vmhost-01 ccsd[3165]: Unable to connect to cluster infrastructure after 60 seconds.

ccsd continues to repeat the last message, increasing time: 60s, 90s, 120s, 180s, 210s, etc

dmesg shows: 
dlm: closing connection to node 4
dlm: closing connection to node 3
dlm: closing connection to node 2

dlm: closing connection to node 1

There are no open  files on GFS (from lsof)

I am using gfs (1).

The only workaround I have now is to reset the nodes via ILO once the shutdown process starts (and hangs on either gfs or cman service stop).

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster