I will try and find more information on the errors in the logs but I think the problem was that I was using a 32bit system and tried to expand over 16TB. I didn't realize this was the size limit until I read the FAQ afterwards. Is this the root cause of the problem? I now have the file system mounted again and am copying what I can off it by moving the files by name. So far we have copied over 250GB of files and not a single file has failed to copy or caused the file system to withdraw. It is fortunate that we knew the name of every file on the file system. Not really understanding the structure of the file system myself, do you think it's possible we will recover all the files using this method? Thanks Ben > -----Original Message----- > From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bob > Peterson > Sent: 25 January 2008 14:28 > To: linux clustering > Subject: Re: Failed gfs_grow causing corrupt volume > > On Fri, 2008-01-25 at 12:08 +0000, Ben Yarwood wrote: > > Trying to grow a 15TB file system to 20TB this morning, using RHEL4.4 I got an error and the grow > failed. The file system will > > still mount but when accessed gives the following error and withdraws: > > > > Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: fatal: invalid metadata block > > Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: bh = 465407847 (type: exp=4, > found=3) > > Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: function = gfs_get_meta_buffer > > Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: file = > > /builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/gfs/dio.c, line = 1223 > > Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: time = 1201260769 > > Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: about to withdraw from the cluster > > Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: waiting for outstanding I/O > > Jan 25 11:32:49 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: telling LM to withdraw > > Jan 25 11:32:50 jrmedia-c kernel: lock_dlm: withdraw abandoned memory > > Jan 25 11:32:50 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav.0: withdrawn > > Hi Ben, > > It sounds like you found a bug in gfs_grow. It should probably have > cleaned up after itself when it failed. Can you tell me more about > the gfs_grow error and possibly open a bugzilla record for it? > Nobody else has reported a problem like this to my knowledge. > > Unfortunately, as far as your file system is concerned, there is not > much that can be done. I tried to put a lot of smarts into gfs_fsck to > repair weird and damaged RG conditions (thus the 3 levels of RG repair). > Unfortunately, gfs_grow throws the normal ("mkfs") rules out and can put > file system metadata in places that gfs_fsck can't reasonably predict. > > (I did my best to remedy that with gfs2 (gfs2_grow) but we can't > change the on-disk format of gfs1, so we can't change it.) > > Regards, > > Bob Peterson > Red Hat GFS > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster