Sorry for the blank message. Thanks for the advice, unfortunately the file system is part of a back end for a website which is permanently in use so taking it offline overnight is not really an option. We do have a virtually live backup copy so I will get this fully synchronised and then try the gfs_fsck process you suggested. As a quick solution, I think I'll just unmount the file system from all nodes and run gfs_fsck until I see pass one start then kill it. Hopefully the problem will have been solved. If this doesn't work, I'll mount the backup and run the full gfs_fsck. If that doesn't work, I'll rebuild the whole file system. Regarding gfs_grow, in my experience it often/always fails when the file system which is being grown is in use. Previously I have just waited until it fails and then run it again once I have stopped any process that is accessing the particular file system. I'm not sure what possessed me to hit <ctrl-c>, not my finest moment! Regards Ben > -----Original Message----- > From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bob > Peterson > Sent: 28 August 2007 15:31 > To: linux clustering > Subject: Re: gfs_grow > > On Tue, 2007-08-28 at 10:08 +0100, Ben Yarwood wrote: > > I am using a 3 Node cluster using RHEL4U4. > > > > I ran a gfs_grow yesterday on one of our filesystems but stupidly missed a process that was using > the same file system. The grow > > process hung and when I got it to exit, the file system is now reporting as having grown to the > larger size but no extra space has > > appeared. Basically my file system grew from 14TB to 15TB and my usage also grew from 13TB to 14TB. > > > > Does anyone know if it's possible to get this space back? I know I could probably do as gfs_fsck > but given the size of the file > > system, this would take a few days according to some previous reports. > > > > Thanks > > Ben > > Hi Ben, > > The fact that there was a process using the file system shouldn't have > been a problem and gfs_grow should have been able to work around it. > It would have been interesting to see where gfs_grow was "hung" but it's > too late for that now. My guess is that you killed gfs_grow before it > was able to update the resource group index properly. > > In RHEL4U4 there is a feature to gfs_fsck to change and repair damaged > RGs and RG indexes. Things get tricky for the code once the file system > has been extended though, so although you probably don't want to hear > this, you should probably make a backup of your data first, just to be > safe. > > Running gfs_fsck will take a while on a file system that big, but it > depends on the speed of your hardware. I'd expect it to take less than > a day to complete. If you can't afford the down time, it might be > helpful to know that the RG repair is done before any of the passes, so > in theory you could probably try to use it to repair the RGs and then > kill the gfs_fsck. Newer versions of gfs_fsck will catch <ctrl-c> > interrupts and give you options to skip around parts, but I don't think > that's in RHEL4U4 (I think it got into RHEL4.5). > > So I guess my recommendation is: > > 1. Make a backup of your data > 2. Wait until most people have gone home for the day > 3. Unmount the file system from ALL nodes. > 4. Run gfs_fsck. > 5. Watch the gfs_fsck output for messages about finding and fixing > RG damage just so you know it did something. > 6. Let gfs_fsck run overnight. > 7. If you need the file system back and it's still running by morning, > you could kill it manually. It would be better to let it run, but > it shouldn't do any harm to kill it prematurely if necessary. > 8. Remount the file system and see if df shows the correct values. > > I hope this helps. > > Regards, > > Bob Peterson > Red Hat Cluster Suite > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster