Kindly Follow some mailing list critique. Donot jump up unrelated threads without knowing what is being discussed. On Thu, Dec 3, 2009 at 2:41 PM, CoolCold <coolthecold@xxxxxxxxx> wrote: > On Wed, Dec 2, 2009 at 6:38 PM, <greg@xxxxxxxxxxxx> wrote: >> On Nov 30, 2:08pm, Farkas Levente wrote: >> } Subject: /etc/cron.weekly/99-raid-check >> >>> hi, >> >> Hi Farkas, hope your day is going well. Just thought I would respond >> for the edification of others who are troubled by this issue. >> >>> it's been a few weeks since rhel/centos 5.4 released and there were many >>> discussion about this new "feature" the weekly raid partition check. >>> we've got a lot's of server with raid1 system and i already try to >>> configure them not to send these messages, but i'm not able ie. i >>> already add to the SKIP_DEVS all of my swap partitions (since i read it >>> on linux-kernel list that there can be mismatch_cnt even though i still >>> not understand why?). but even the data partitions (ie. all of my >>> servers all raid1 partitions) produce this error (ie. ther mismatch_cnt >>> is never 0 at the weekend). and this cause all of my raid1 partitions >>> are rebuild during the weekend. and i don't like it:-( >>> so my questions: >>> - is it a real bug in the raid1 system? >>> - is it a real bug in my disk which runs raid (not really believe since >>> it's dozens of servers)? >>> - the /etc/cron.weekly/99-raid-check is wrong in rhel/centos-5.4? >>> or what's the problem? >>> can someone enlighten me? >> >> Its a combination of what I would consider a misfeature with what MAY >> BE, and I stress MAY be a sentient bug someplace. >> >> The current RAID/IO stack does not 'pin' pages which are destined to >> be written out to disk. As a result the contents of the pages may >> change as the request to do I/O against these pages transits the I/O >> stack down to disk. > > Can you write a bit more about "the pages may change"? 'Who' can > change page contents ? > > >> This results in a 'race' condition where one side of a RAID1 mirror >> gets one version ofdata written to it while the other side of the >> mirror gets a different piece of data written to it. In the case of a >> swap partition this appears to be harmless. In the case of >> filesystems there seems to be a general assurance that this occurs >> only in uninhabited portions of the filesystem. >> >> The 'check' feature of the MD system which the 99-raid-check uses >> reads the underlying physical devices of a composite RAID device. The >> mismatch_cnt is elevated if the contents of mirrored sectors are not >> identical. >> >> The results of the intersection of all this are problematic now that >> major distributions have included this raid-check feature. There are >> probably hundreds if not thousands of systems which are reporting what >> may or may not be false positives with respect to data corruption. >> >> The current RAID stack has an option to 'repair' a RAID set which has >> mismatches. Unfortunately there is no intelligence in this facility >> and it randomly picks one of the sectors as being 'good' and uses that >> to replace the contents of the other sector. I'm somewhat reticent to >> recommend the use of this facility given the issues at hand. >> >> A complicating factor is that the kernel does not report the location >> of where the mismatches occur. There appears to be movement underway >> to include support in the kernel for printing out the sector locations >> of the mismatches. >> >> When that feature becomes available there will be a need to have some >> type of tool, in the case of RAID1 devices backing filesystems, to >> make an assessment of which version of the data is 'correct' so the >> faulty version can be over-written with the correct version. >> >> As an aside what is really needed is a tool which assesses >> whether or not the mismatched sectors are actually in an >> inhabited portion of the filesystem. If not the 'repair' >> facility on RAID1 could be presumably run with no issues. >> Given the appropriate coherency/validation checks to make sure >> the sectors are still incoherent secondary to a race where the >> uninhabited portion chooses to become inhabited. >> >> We see the issue over a large range of production systems running >> standard RHEL5 kernels all the way up to recent versions of Fedora. >> Interestingly the mismatch counts are always an exact multiple of 128 >> on all the systems. >> >> We have also isolated the problem to be RAID1 and independent of the >> backing store. We run geographical mirrors where an initiator is fed >> from two separate data-centers where each mirror half is based on a >> RAID5 Linux target. On RAID1 mirrors which are mismatched the two >> separate RAID5 backing volumes both report completely consistent >> volumes. >> >> So there is the situation as I believe it currently stands. >> >> The notion of running the 'check' sync_action is well founded. The >> issue of 'silent' data corruption is well understood and well founded. >> The Linux RAID system as of a couple of years ago will re-write any >> sectors which come up as unreadable during the check process. Disk >> drives will re-allocate a sector from their re-mapping pool >> effectively replacing the bad sector. This pays huge dividends with >> respect to maintaining healthy RAID farms. >> >> Unfortunately the report of the mismatch_cnt's is problematic given >> the above issues. I think it is unfortunate the vendors opted to >> release this checking/reporting while these issues are still unresolved. >> >>> thanks in advance. >>> regards. >>> >>> -- >>> Levente "Si vis pacem para bellum!" >> >> Hope the above information is helpful for everyone running into this >> issue. >> >> Best wishes for a productive remainder of the week to everyone. >> >> Greg >> >> }-- End of excerpt from Farkas Levente >> >> As always, >> Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. >> 4206 N. 19th Ave. Specializing in information infra-structure >> Fargo, ND 58102 development. >> PH: 701-281-1686 >> FAX: 701-281-3949 EMAIL: greg@xxxxxxxxxxxx >> ------------------------------------------------------------------------------ >> "Experience is something you don't get until just after you need it." >> -- Olivier >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Best regards, > [COOLCOLD-RIPN] > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- -- Sujit K M blog(http://kmsujit.blogspot.com/) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html