----- "Koustubha Kale" <koustubha_kale@xxxxxxxxx> wrote: | Hi all, | We have a three node GFS2 cluster on a CentOS 5.4 output of uname -a | GFS2 errors and file system withdrawls, nodes restarting. The error in | log is as shown below.. Hi, What version of fsck.gfs2 did you use to fix these errors? Not that long ago, I discovered that fsck.gfs2 is not always cleaning everything up that it should on the first pass. Sometimes it finds and fixes more inconsistencies on the second run. The issue will be much better when the 5.5 release is out. But I've found some serious problems even in the 5.5 version. For example, when orphaned dinodes are tossed into lost+found, it can sometimes get the block accounting wrong. I've got a better, faster fsck.gfs2 on my people page for people to try. This one is more thorough, better block accounting and has added error checking, so it should do a much better job of cleaning things up. It's had a lot of testing and has gotten a lot of positive feedback from other people too: http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/fsck.gfs2 This is an x86_64 version. I recommend these steps: 1. Download this experimental fsck.gfs2 to some directory 2. Unmount the file system from all nodes 3. Save off a copy of the file system metadata: gfs2_edit savemeta /dev/device /some/file.meta This saved copy means you can always go back if fsck.gfs2 makes some kind of mistake 4. run the new fsck.gfs2 on the file system See if that helps the situation. Regards, Bob Peterson Red Hat File Systems -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster