On Wed, Dec 2, 2009 at 6:38 PM, <greg@xxxxxxxxxxxx> wrote: > On Nov 30, 2:08pm, Farkas Levente wrote: > } Subject: /etc/cron.weekly/99-raid-check > >> hi, > > Hi Farkas, hope your day is going well. Just thought I would respond > for the edification of others who are troubled by this issue. > >> it's been a few weeks since rhel/centos 5.4 released and there were many >> discussion about this new "feature" the weekly raid partition check. >> we've got a lot's of server with raid1 system and i already try to >> configure them not to send these messages, but i'm not able ie. i >> already add to the SKIP_DEVS all of my swap partitions (since i read it >> on linux-kernel list that there can be mismatch_cnt even though i still >> not understand why?). but even the data partitions (ie. all of my >> servers all raid1 partitions) produce this error (ie. ther mismatch_cnt >> is never 0 at the weekend). and this cause all of my raid1 partitions >> are rebuild during the weekend. and i don't like it:-( >> so my questions: >> - is it a real bug in the raid1 system? >> - is it a real bug in my disk which runs raid (not really believe since >> it's dozens of servers)? >> - the /etc/cron.weekly/99-raid-check is wrong in rhel/centos-5.4? >> or what's the problem? >> can someone enlighten me? > > Its a combination of what I would consider a misfeature with what MAY > BE, and I stress MAY be a sentient bug someplace. > > The current RAID/IO stack does not 'pin' pages which are destined to > be written out to disk. As a result the contents of the pages may > change as the request to do I/O against these pages transits the I/O > stack down to disk. Can you write a bit more about "the pages may change"? 'Who' can change page contents ? > This results in a 'race' condition where one side of a RAID1 mirror > gets one version ofdata written to it while the other side of the > mirror gets a different piece of data written to it. In the case of a > swap partition this appears to be harmless. In the case of > filesystems there seems to be a general assurance that this occurs > only in uninhabited portions of the filesystem. > > The 'check' feature of the MD system which the 99-raid-check uses > reads the underlying physical devices of a composite RAID device. The > mismatch_cnt is elevated if the contents of mirrored sectors are not > identical. > > The results of the intersection of all this are problematic now that > major distributions have included this raid-check feature. There are > probably hundreds if not thousands of systems which are reporting what > may or may not be false positives with respect to data corruption. > > The current RAID stack has an option to 'repair' a RAID set which has > mismatches. Unfortunately there is no intelligence in this facility > and it randomly picks one of the sectors as being 'good' and uses that > to replace the contents of the other sector. I'm somewhat reticent to > recommend the use of this facility given the issues at hand. > > A complicating factor is that the kernel does not report the location > of where the mismatches occur. There appears to be movement underway > to include support in the kernel for printing out the sector locations > of the mismatches. > > When that feature becomes available there will be a need to have some > type of tool, in the case of RAID1 devices backing filesystems, to > make an assessment of which version of the data is 'correct' so the > faulty version can be over-written with the correct version. > > As an aside what is really needed is a tool which assesses > whether or not the mismatched sectors are actually in an > inhabited portion of the filesystem. If not the 'repair' > facility on RAID1 could be presumably run with no issues. > Given the appropriate coherency/validation checks to make sure > the sectors are still incoherent secondary to a race where the > uninhabited portion chooses to become inhabited. > > We see the issue over a large range of production systems running > standard RHEL5 kernels all the way up to recent versions of Fedora. > Interestingly the mismatch counts are always an exact multiple of 128 > on all the systems. > > We have also isolated the problem to be RAID1 and independent of the > backing store. We run geographical mirrors where an initiator is fed > from two separate data-centers where each mirror half is based on a > RAID5 Linux target. On RAID1 mirrors which are mismatched the two > separate RAID5 backing volumes both report completely consistent > volumes. > > So there is the situation as I believe it currently stands. > > The notion of running the 'check' sync_action is well founded. The > issue of 'silent' data corruption is well understood and well founded. > The Linux RAID system as of a couple of years ago will re-write any > sectors which come up as unreadable during the check process. Disk > drives will re-allocate a sector from their re-mapping pool > effectively replacing the bad sector. This pays huge dividends with > respect to maintaining healthy RAID farms. > > Unfortunately the report of the mismatch_cnt's is problematic given > the above issues. I think it is unfortunate the vendors opted to > release this checking/reporting while these issues are still unresolved. > >> thanks in advance. >> regards. >> >> -- >> Levente "Si vis pacem para bellum!" > > Hope the above information is helpful for everyone running into this > issue. > > Best wishes for a productive remainder of the week to everyone. > > Greg > > }-- End of excerpt from Farkas Levente > > As always, > Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. > 4206 N. 19th Ave. Specializing in information infra-structure > Fargo, ND 58102 development. > PH: 701-281-1686 > FAX: 701-281-3949 EMAIL: greg@xxxxxxxxxxxx > ------------------------------------------------------------------------------ > "Experience is something you don't get until just after you need it." > -- Olivier > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Best regards, [COOLCOLD-RIPN] -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html