bigendian+gfs@xxxxxxxxx wrote:
Hello Robert,
The other node was previously rebuilt for another temporary purpose
and isn't attached to the SAN. The only thing I can think of that
might have been out of the ordinary is that I may have pulled the
power on the machine while it was shutting down during some file
system operation. The disk array itself never lost power.
I do have another two machines configured in a different cluster
attached to the SAN. CLVM on machines in the other cluster does show
the volume that I am having trouble with though those machines do not
mount the device. Could this have caused the trouble?
More importantly, is there a way to repair the volume? I can see the
device with fdisk -l and gfs_fsck completes with errors, but mount
attempts always fail with the "mount: /dev/etherd/e1.1 already mounted
or /gfs busy" error. I don't know how to debug this at a lower level
to understand why this error is happening. Any pointers?
Hi Tom,
Well, if gfs_fsck aborted prematurely, it may have left your lock
protocol in an
unusable state. Ordinarily, gfs_fsck temporarily changes the locking
protocol
to "fsck_xxxx" to prevent someone from mounting the file system while it's
busy doing the file system check. When it's done, it sets it back to
"lock_xxxx".
However, older versions of gfs_fsck weren't setting it back to "lock_xxxx"
when they bailed out due to errors. That's since been corrected in the
latest
version of gfs_fsck, which I think is in U4. Try this:
gfs_tool sb /dev/etherd/e1.1 proto
If it says "fsck_dlm" or something with fsck, then it's wrong. To fix
it, do:
gfs_tool sb /dev/etherd/e1.1 proto lock_dlm
(If you're using DLM locking, or lock_gulm if you're using Gulm locking).
If it still doesn't let you mount, look in dmesg for error messages
relating to
why it can't mount. If the logical volume is really in use, you can
eliminate
the other systems by doing "vgchange -an etherd" on the other machines,
and try again.
I'm still confused, though: Are you running the latest gfs_fsck and was it
or wasn't it able to repair the damaged RGs? Did it error out or did it
go through all the passes 1 through 5?
Regards,
Bob Peterson
Red Hat Cluster Suite
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster