On Mon, 2008-03-10 at 10:28 -0400, James Chamberlain wrote: > I have just had my cluster crash yet again, but this time, I was able to > capture the full kernel panic. <snip> > I'm experiencing upwards of 8 crashes a day because of this. What can I do > about it? > > Thanks, > > James Hi James, The only times I've seen a problem like this is when GFS's resource group information somehow got corrupted. I recommend doing this: 1. Unmount the file system from all nodes in your cluster 2. Back up your storage in any way you can without it being mounted (dd it to another storage or tape or something?) 3. Run gfs_fsck on the file system. If this is > 15TB, make sure you run it on a 64-bit node. Hopefully your system isn't too old and you have a relatively recent version of gfs_fsck, which has the smarts to repair damaged RGs. I'm just guessing about the corruption, but given that, the next question is how it got corrupted. There are a number of ways that can happen. For example hardware problems, or running gfs_fsck while the file system is mounted on some node. BTW, I've only seen RG corruption two or three times in the past 2+ years. Regards, Bob Peterson Red Hat GFS -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster