Re: gfs2 hang

Steven Whitehouse <swhiteho@xxxxxxxxxx> · Wed, 02 Jan 2008 09:21:22 +0000

Hi,

On Thu, 2007-12-27 at 13:31 -0800, Scooter Morris wrote:
> Greetings,
>     We've got a two-node cluster running RHEL 5.1 that we've been 
> experimenting with and have discovered a problem with gfs2.  As part of 
> our build environment, we have some find scripts that walk a directory tree:
> 
> #! /bin/sh
> for html in `/usr/bin/find curGenerated -name \*.html -print` ; do \
>  cat $html > tmpCR.html ; \
>  /bin/mv tmpCR.html $html ; \
> done
> 
> The curGenerated directory has about 141 subdirectories, each of which 
> has from 2-10 subdirectories.  What we find is that this find script 
> will hang the operating system when it is executed within a gfs2 
> partition that is shared between the two nodes.  Fencing is configured 
> and detects the hung node and restarts it, but that's not much of a 
> consolation.  The gfs2 partition lives on a fibreChannel array (HP 
> EVA5000), and quotas are not turned on.  The gfs2 filesystem continues 
> to operate normally on the other node.
> 
> Is this a known bug in gfs2?   Is there something we could do to help 
> find this problem?
> 
> Thanks!
> 
> -- scooter
> 
I think this is probably a known bug, bz #404711 which is fixed in
upstream and also for 5.2. It triggers when rename is called in the
situation where it needs to allocate an extra block for the directory
and also there is a target file being unlinked, and also where both of
these operations happen to occur in the same resource group.

If this doesn't turn out to be the case, then please file a bugzilla,

Steve.

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster