Hi Bob, You can skip to the middle; I leave my ramblings here since I've typed them already... Bob Peterson <rpeterso@xxxxxxxxxx> writes: > On Wed, 2008-02-13 at 17:22 +0100, Ferenc Wagner wrote: >> *Here* comes something possibly interesting, after fence_tool join: >> >> fenced[4543]: fencing deferred to prior member >> >> Though it doesn't look like node3 (which has the filesystem mounted) >> would want to fence node1 (which has this message in its syslog). Is >> there a command available to find out the current fencing status or >> history? > > Not as far as I know. You should probably look in the /var/log/messages > on all the nodes to see which node decided it needed to fence the other > and why. Perhaps it's not letting you mount gfs because of a pending > fence operation. You could do cman_tool services to see the status > of the cluster from all nodes. On node3, which has the GFS mounted: # cman_tool services type level name id state fence 0 default 00010001 none [1 3] dlm 1 clvmd 00030001 none [1 3] dlm 1 test 00050003 none [3] gfs 2 test 00040003 none [3] On node1, which can't mount the filesystem, the first half is the same, and the second half (dlm and gfs) is missing. There are failed fence attempts in the logs of node3 from a couple of days ago. Since then, node1 was rebooted a couple of times; I also stopped the cluster infrastructure on node3 (back to killing ccsd). Then brought up the two nodes simultaneously, starting fenced with the -c option; activation of the clustered VG propagated to the other node as expected. After all this, node1 still can't mount the filesystem. Node3 can, but it seemingly doesn't influence node1... >> Well, it helps on the node which has the filesystem mounted. Of >> course not on the other. Is gfs_tool supposed to work on mounted >> filesystems only? Probably so. > > Some of the gfs_tool commands like "gfs_tool sb" need a device > and others like "gfs_tool lockdump" need a mount point. The man > page says which requires which. Exactly. Too bad I didn't care to check earlyer. Sorry for that. > Another thing you can try is mounting it with the mount helper > manually, with verbose mode, by doing something like this: > > /sbin/mount.gfs -v -o users -t gfs /dev/your/device /your/mount/point > > And see what information it gives you. Using the straight mount > command won't put the mount helper into verbose mode, so you may > get more information that way. Not only more info, but... see: # mount -t gfs /dev/gfs/test /mnt mount: wrong fs type, bad option, bad superblock on /dev/gfs/test, missing codepage or other error In some cases useful info is found in syslog - try dmesg | tail or so # /sbin/mount.gfs -v -o users -t gfs /dev/gfs/test /mnt bash: /sbin/mount.gfs: No such file or directory # /usr/sbin/mount.gfs -v -o users -t gfs /dev/gfs/test /mnt /usr/sbin/mount.gfs: mount /dev/mapper/gfs-test /mnt /usr/sbin/mount.gfs: parse_opts: opts = "users" /usr/sbin/mount.gfs: set flag 0 for "users", flags = 0 /usr/sbin/mount.gfs: parse_opts: flags = 0 /usr/sbin/mount.gfs: parse_opts: extra = "" /usr/sbin/mount.gfs: parse_opts: hostdata = "" /usr/sbin/mount.gfs: parse_opts: lockproto = "" /usr/sbin/mount.gfs: parse_opts: locktable = "" /usr/sbin/mount.gfs: message to gfs_controld: asking to join mountgroup: /usr/sbin/mount.gfs: write "join /mnt gfs lock_dlm pilot:test users /dev/mapper/gfs-test" /usr/sbin/mount.gfs: message from gfs_controld: response to join request: /usr/sbin/mount.gfs: lock_dlm_join: read "0" /usr/sbin/mount.gfs: message from gfs_controld: mount options: /usr/sbin/mount.gfs: lock_dlm_join: read "hostdata=jid=1:id=65539:first=0" /usr/sbin/mount.gfs: lock_dlm_join: hostdata: "hostdata=jid=1:id=65539:first=0" /usr/sbin/mount.gfs: lock_dlm_join: extra_plus: "hostdata=jid=1:id=65539:first=0" /usr/sbin/mount.gfs: mount(2) ok /usr/sbin/mount.gfs: lock_dlm_mount_result: write "mount_result /mnt gfs 0" /usr/sbin/mount.gfs: read_proc_mounts: device = "/dev/mapper/gfs-test" /usr/sbin/mount.gfs: read_proc_mounts: opts = "rw,hostdata=jid=1:id=65539:first=0" Which for me means that the mount tool couldn't find the helper as I put it into /usr/sbin instead of /sbin (probably some unfortunate configure option or installation choice -- it's in /sbin on node3 which runs the cluster suite compiled agains the previous kernel). I will investigate this after having some good sleep. Since the overly general error message is from mount, not from the mount.gfs helper, you probably can't do much for improving it. But explicitly invoking the helper in verbose mode looks like a very powerful troubleshooting trick! > Perhaps you should also try "group_tool dump gfs" to see if there are > error messages from the gfs control daemon pertaining to gfs and > why the mount failed. That's another interesting information source. Still it wasn't enough for me to get out of this trap: after the above successful mount, I unmounted the GFS with the stock umount, which again coultn't find the helper, but did the job nevertheless. Except for leaving the mount group, so that I can mount the filesystem again... Now the helper says: message to gfs_controld: asking to join mountgroup: write "join /mnt gfs lock_dlm pilot:test users /dev/mapper/gfs-test" mount point already used or other mount in progress error mounting lockproto lock_dlm and umount.gfs can't help as it doesn't find the mount in /proc/mounts... Is it possible to fix this or do I have to reboot? >> Thanks for the clarification. And what does that deferred fencing >> mean? > > That means some node decided it was necessary to fence another node > and it is waiting for that fence to complete. If there's a pending > fence of a node that's not completing, check to make sure your > fence device is configured and working properly. How could I find out about a pending fence? Only by periodic messages in the syslog? -- Thanks, Feri. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster