Re: lm_dlm_cancel

James Chamberlain <jamesc@xxxxxxx> · Tue, 2 Sep 2008 10:15:25 -0400 (EDT)

On Tue, 2 Sep 2008, David Teigland wrote:

On Mon, Sep 01, 2008 at 07:55:48PM -0400, James Chamberlain wrote:
Hi all,

Since I sent the below, the aforementioned cluster crashed.  Now I
can't mount the scratch112 filesystem.  Attempts to do so crash the
node trying to mount it.  If I run gfs_fsck against it, I see the
following:

# gfs_fsck -nv /dev/s12/scratch112
Initializing fsck
Initializing lists...
Initializing special inodes...
Validating Resource Group index.
Level 1 check.
5834 resource groups found.
(passed)
Setting block ranges...
Can't seek to last block in file system: 4969529913
Unable to determine the boundaries of the file system.
Freeing buffers.

Not being able to determine the boundaries of the file system seems
like a very bad thing.  However, LVM didn't complain in the slightest
when I expanded the logical volume.  How can I recover from this?

Looks like the killed gfs_grow left your fs is a bad condition.
I believe Bob Peterson has addressed that recently.

I think it was in a bad condition before I hit ^C rather than because I 
did.  As I mentioned, I was getting the lm_dlm_cancel messages before I hit 
^C.  But I'd agree that one way or another, the gfs_grow operation somehow 
left the fs in a bad state.

I'm trying to grow a GFS filesystem.  I've grown this filesystem
before and everything went fine.  However, when I issued gfs_grow
this time, I saw the following messages in my logs:

Aug 29 21:04:13 s12n02 kernel: lock_dlm: lm_dlm_cancel 2,17 flags 80
Aug 29 21:04:13 s12n02 kernel: lock_dlm: lm_dlm_cancel skip 2,17
flags 100
Aug 29 21:04:14 s12n02 kernel: lock_dlm: lm_dlm_cancel 2,17 flags 80
Aug 29 21:04:14 s12n02 kernel: dlm: scratch112: (14239) dlm_unlock:
10241 busy 2
Aug 29 21:04:14 s12n02 kernel: lock_dlm: lm_dlm_cancel rv -16 2,17
flags 40080

The last three lines of these log entries repeat themselves once a
second until I hit ^C.  The filesystem appears to still be up and
accessible.  Any thoughts on what's going on here and what I can do
about it?

Should be fixed by
https://bugzilla.redhat.com/show_bug.cgi?id=438268

Thanks Dave.  Any idea if there's a corresponding patch for RHEL 4?

Regards,

James

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster