Re: gfs2 withdraw while copy to filesystem

Steven Whitehouse <swhiteho@xxxxxxxxxx> · Thu, 04 Oct 2007 10:40:28 +0100

Hi,

On Thu, 2007-10-04 at 08:53 +0200, Arthur MEßNER wrote:
> First i build a xen disk image on the local storage fs,
> then want "cp" this disk image to the gfs2 filesystem.
> 
> 
> cp disk.img /xenfs/storage1/xenmachine/
> 
> this filesystem ( /xenfs/storage1 )is gfs2 with option -j 10, lock_dlm
> 
> mount option is : noatime,quota=off
> 
> I have done this sevreal times on one node
> of my two node cluster, no problem.
> 
> Yesterday i added another node ( the third ), 
> an on this node the same procedure ended up with this.
> 
> Oct  4 07:44:15 xen03 kernel: GFS2: fsid=xen:storage1.1: fatal:
> assertion "x <= length" failed
> Oct  4 07:44:15 xen03 kernel: GFS2: fsid=xen:storage1.1:   function =
> rgblk_search, file = fs/gfs2/rgrp.c, line = 1116
> Oct  4 07:44:15 xen03 kernel: GFS2: fsid=xen:storage1.1: about to
> withdraw this file system
> Oct  4 07:44:15 xen03 kernel: GFS2: fsid=xen:storage1.1: telling LM to
> withdraw
> Oct  4 07:44:45 xen03 kernel: GFS2: fsid=xen:storage1.1: withdrawn
> 
> On this node and this gfs2 filesystem the crash is reproduceable.
> i never tried it on another node, because they are productive.
> 
> After reboot, the gfs2 can normally be accessed.
> The same if i try with "dd if=xxx.img of=xxx.img"
> 
> Any suggestion, where the problem is ?
> locking, gfs2 options ....
> 

This message means that when GFS2 tried to allocate some blocks it
couldn't find any in the resource group it had previously selected and
in which it has previously reserved some blocks.

The reason that this appears only to affect a single node is that GFS2
tries to keep resource groups local to a single node where it can to
avoid having to pass the lock (and hence also the cache) of the resource
group about the cluster (which is inefficient). So this may show up on
the other nodes in the case that the filesystem gets closer to being
full (which increases the chance that the other nodes will search this
resource group).

I'd suggest in the first instance running GFS2's fsck in order to be
certain that its a problem on the disk, but thats what it looks like to
me. It is probably just the summary information which is out of line
with the actual bitmaps on that resource group, so I wouldn't expect to
see any data loss.

Do you know what kind of fs activity caused that in the first place?

I can't see anything else that you are doing wrong, but I wonder which
kernel version you are using?

Steve.

> 
> 
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster