Re: RE: GFS2 subdirectory hang

Steven Whitehouse <swhiteho@xxxxxxxxxx> · Fri, 28 Aug 2009 08:46:30 +0100

Hi,

On Thu, 2009-08-27 at 13:16 -0500, Johnson, Eric wrote:
> On Thu, 2009-08-27 at 09:25 -0500, Johnson, Eric wrote:
> >> I have a 32-bit RHEL 5.3 Cluster Suite setup of two nodes with GFS2
> file
> >> systems on FC attached SAN. I have run into this issue twice now,
> where
> >> attempts to access a certain directory within one of the GFS2 file
> >> systems never return. Other directories and paths within that file
> >> system work just fine.
> >> 
> >> The first time it happened, I had to crash the node to get it to
> release
> >> the FS, then unmount it on both nodes, fsck it, remount it, and it
> was
> >> fine. It has happened again (different path, different file system).
> A
> >> simple "ls" in the directory (which has maybe 20 files in it) leaves
> the
> >> process in an uninterruptible sleep state. I left it all night and it
> >> never returned.
> >> 
> >> I'm not sure what other info would be useful on this, but this is
> what I
> >> see from a gfs2_tool lockdump output for ls PID on that node:
> >> 
> >> G:  s:UN n:2/bf1df f:l t:SH d:EX/0 l:0 a:0 r:4
> >>  H: s:SH f:aW e:0 p:9938 [ls] gfs2_lookup+0x44/0x90 [gfs2]
> >              ^ The W flag indicates that this is waiting for a glock
> >
> >Currently the glock is in the UN (unlocked) state, and its trying to
> get
> >a SH (shared) lock. The next step in the investigation is to look for
> >the same glock number 2/bf1df on the other nodes, and see what is
> >holding that lock. This particular node will hang until the lock is
> >released on whichever other node is holding it.
> >
> >If there is nothing on any other node apparently holding that lock in
> >the glock dumps, then looking at dlm lock dumps would be the next step,
> >
> >Steve.
> 
> Thanks for the response, Steve. I found this reference to that lock on
> the other node:
> 
> G:  s:EX n:2/bf1df f:dy t:EX d:SH/0 l:0 a:0 r:4
>  I: n:1155192/782815 t:8 f:0x00000010
> 
> I'm having trouble finding documentation that describes what each of
> these fields are. There's no obvious process ID here, and all I can
> determine is that it's an exclusive lock.
> 
> Eric
> 
There is now some documentation in the upstream kernels under
linux/Documentation/filesystems/gfs2-glocks.txt and I hope to gradually
expand the docs available. There was more info in my Linux Symposium
paper too, but I see that they haven't appeared online yet.

The issue with the glock dump file is that it tells you how things are
now, and not how they got like that. In the case of the above example,
there is a glock in EX mode which has received a demote request and it
has been deferred (thats probably because the node had only just
received the lock when the demote request arrived) and for some reason
it looks like this has not been acted upon.

What should have happened is that upon expiry of the time interval, the
glock workqueue should have been scheduled to perform the demotion of
the glock. As it is, it looks like the glock is sitting there happily
caching the data and ignoring the pending demote.

You might be able to unstick this by unmounting & remounting or by
remounting read-only (and then you can remount back to rw right
afterwards) as that should be enough to release that lock and allow the
other node to make progress.

The real issue though, is why the workqueue didn't get run when it
should have done. Its a tricky issue to track down as there are a number
of possible code paths, and we need a record of which ones were taken.

This can be done in the upstream code with the new tracing code, but it
is not easily possible in RHEL5 kernels. If you are in a position where
you can run test kernels, then we can help track down the source of the
issue,

Steve.

> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster