Re: GFS Problem: invalid metadata block

Robert Peterson <rpeterso@xxxxxxxxxx> · Tue, 10 Oct 2006 15:09:45 -0500

Matt Eagleson wrote:
Hello,

I have been evaluating a GFS cluster as an NFS solution and have 
unfortunately run in to a serious problem which I cannot explain.  
Both of the GFS filesystems I am exporting became corrupt and unusable.

The system is Redhat AS4 with 2.6.9-42.0.2.ELsmp.  I cannot find 
anything unusual on the host or the SAN at the time of this error.  
Nobody was logged in to the nodes.

Can anyone help me understand what is happening here?

Here are the logs:

Hi Matt,

These errors indicate file system corruption on your SAN.  The "bh =" is 
the
block number where the error was detected.  Two of the errors were found
in GFS resource group data ("RG"), which are areas on disk that indicate 
which
blocks on the SAN are allocated and which aren't.  (Not to be confused 
with the
Resource Groups in rgmanager, which is something completely different.) 
The third error is usually reserved for the quota file inode. 

Corruption in the RG information is extremely rare, and may indicate a 
hardware
problem with your SAN.  The fact that both nodes detected problems in 
different
areas is an indication that the problem might be in the SAN itself 
rather than
the motherboards, fibre channel cards or memory of the nodes, although 
that's
still not guaranteed.  Many things can cause data corruption.

I recommend you:

1. Verify the hardware is working properly in all respects.  One way you 
can do this
   is to make a backup of the raw data to another device and verify the 
copy against
   the original without GFS or any of the cluster software in the mix.
   For example, unmount the file system from all nodes in the cluster, 
then do
   something like "dd if=/dev/my_vg/lvol0 of=/mnt/backup/sanbackup" then:
   "diff /dev/my_vg/lvol0 mnt/backup/sanbackup"  (assuming of course that
   /dev/my_vg/lvol0 is the logical volume you have your GFS partition 
on, and
   /mnt/backup/ is some scratch area big enough to hold that much data.)
   The idea here is simply to test that reading from the SAN give you
   the same data twice.  If that works successfully on one node, try it 
on the other node.
2. Once you verify the hardware is working properly, run gfs_fsck on it.
   The latest version of gfs_fsck can repair most GFS rg corruption.
3. If the file system is fixed okay, you should back it up.
4. You may want to do a similar test, only writing data to the SAN, then 
reading it
   back and verifying the results.  Obviously this will destroy the 
data on your SAN
   unless you are careful, so if this is a production machine, please 
take measures
   to protect the data before trying anything like this.
5. If you can read and write to the SAN reliably from both nodes without 
GFS,
   then try using it again with GFS and see if the problem comes back.

Perhaps someone else (the SAN manufacturer?) can recommend hardware
tests you can run to verify the data integrity.

I realize these kinds of tests take a long time to do, but if it's a 
hardware problem,
you really need to know.  There's a outside chance the problem is somewhere
in the GFS core, but I've personally only seen this type of corruption 
once or twice
so I think it's unlikely.  If you can recreate this kind of corruption 
with some kind of
test, please let us know how.

Regards,

Bob Peterson
Red Hat Cluster Suite

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster