On Wed, Sep 16, 2009 at 09:20:35PM +0200, Mario 'BitKoenig' Holbe wrote: > Bryan Mesich <bryan.mesich@xxxxxxxx> wrote: > > The most popular mismatch_cnt values are 128 and 256. The worst I > > found was 21504 and 7168. I find it interesting that all are > > divisible by 128. > > They should not appear on RAID5. I would agree. The only reason I mentioned RAID5 was to remove the possibility that the HD were spontaneously flipping bits. Our SAN environment looks like the following: | Initiaror | FC Target | -------------------------------------------------------------- Block Dev -> LVM -> RAID5 -> Block Dev ext3 -> LVM -> RAID1 { Block Dev -> LVM -> RAID5 -> Block Dev Since the RAID1 block devices on the initiator are sitting on a underlying RAID5 array (on the target), we should not notice random bit flips (or otherwise corruption) that a single drive might. This is only one example of many that I have. I've found mismatches on other RAID1 arrays that are comprised of SATA, SAS, SCSI and/or SAN volumes (as shown in the above example). > What kinds of filesystems reside on the RAID1s? If it's ext[23] (well, > most likely on SANs it's not :)), then these mismatches are very likely > located in inode blocks. Yes, we are running ext3 :). There is only 1 array that is running something else (ext4). I'm not sure if I follow you on the inode problem. If there was a problem, shouldn't the problem be replicated to both block devices? > I've noticed this quite often on RAID1s up to 2.6.26, especially on > filesystems with heavy inode fluctuation (remove, create files). > Starting with .26 (i guess, maybe later) I didn't see them anymore. > Maybe yours on newer kernels are just brownfields? Correct them and see > if they appear again? The majority of our boxes are running the default RHEL 2.6.18 kernel. I would be quicker to blame the RHEL kernel (missing patches/back-ports), but I am seeing this on machines that are running mainline kernels (as new as 2.6.29). I'm not sure that that the problem can be attributed to heavy inode fluctuation since one of my worst offending arrays has only 119 files on it (vmware guests w/pre-allocated disk) and is running ext4. Even if there is FS problems, why doesn't the problem get replicated to both sides of the mirror? I worry about running a repair for fear that I might tromp over something. I was hoping that future writes might fix some of the problem? > > So, is there anyway to get an output on which blocks do not match? I'd > > like to see how they are different, if at all. > > just cmp -l the components, as you did before md brought up the > sync_action check target :) Thanks for the tip :) Bryan
Attachment:
pgp8B7vaj0GNu.pgp
Description: PGP signature