But is this not a good opportunity to repair the bad stripe for a very low cost (no complete resync required)? At time of error we actually know which disk failed and can re-write it, something we do not know at resync time, so I assume we always write to the parity disk. Justin Piszcz wrote: > Should the raid have noticed the error, checked the offending > stripe and taken appropriate action? The messages from that error > are below. > > I don't think so, that is why we need to run check every once and a > while and check the mismatch_cnt file for each md raid device. > > Run repair then re-run check to verify the count goes back to 0. > > Justin. > > On Sat, 24 Feb 2007, Eyal Lebedinsky wrote: > >> I run a 'check' weekly, and yesterday it came up with a non-zero >> mismatch count (184). There were no earlier RAID errors logged >> and the count was zero after the run a week ago. >> >> Now, the interesting part is that there was one i/o error logged >> during the check *last week*, however the raid did not see it and >> the count was zero at the end. No errors were logged during the >> week since or during the check last night. >> >> fsck (ext3 with logging) found no errors but I may have bad data >> somewhere. >> >> Should the raid have noticed the error, checked the offending >> stripe and taken appropriate action? The messages from that error >> are below. >> >> Naturally, I do not know if the mismatch is related to the failure >> last week, it could be from a number of other reasons (bad memory? >> kernel bug?). >> >> >> system details: >> 2.6.20 vanilla >> /dev/sd[ab]: on motherboard >> IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150 Storage >> Controller (rev 02) >> /dev/sd[cdef]: Promise SATA-II-150-TX4 >> Unknown mass storage controller: Promise Technology, Inc.: Unknown >> device 3d18 (rev 02) >> All 6 disks are WD 320GB SATA of similar models >> >> Tail of dmesg, showing all messages since last week 'check': >> >> *** last week check start: >> [927080.617744] md: data-check of RAID array md0 >> [927080.630783] md: minimum _guaranteed_ speed: 24000 KB/sec/disk. >> [927080.648734] md: using maximum available idle IO bandwidth (but not >> more than 200000 KB/sec) for data-check. >> [927080.678103] md: using 128k window, over a total of 312568576 blocks. >> *** last week error: >> [937567.332751] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x4190002 >> action 0x2 >> [937567.354094] ata3.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 >> cdb 0x0 data 512 in >> [937567.354096] res 51/04:83:45:00:00/00:00:00:00:00/a0 Emask >> 0x10 (ATA bus error) >> [937568.120783] ata3: soft resetting port >> [937568.282450] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >> [937568.306693] ata3.00: configured for UDMA/100 >> [937568.319733] ata3: EH complete >> [937568.361223] SCSI device sdc: 625142448 512-byte hdwr sectors >> (320073 MB) >> [937568.397207] sdc: Write Protect is off >> [937568.408620] sdc: Mode Sense: 00 3a 00 00 >> [937568.453522] SCSI device sdc: write cache: enabled, read cache: >> enabled, doesn't support DPO or FUA >> *** last week check end: >> [941696.843935] md: md0: data-check done. >> [941697.246454] RAID5 conf printout: >> [941697.256366] --- rd:6 wd:6 >> [941697.264718] disk 0, o:1, dev:sda1 >> [941697.275146] disk 1, o:1, dev:sdb1 >> [941697.285575] disk 2, o:1, dev:sdc1 >> [941697.296003] disk 3, o:1, dev:sdd1 >> [941697.306432] disk 4, o:1, dev:sde1 >> [941697.316862] disk 5, o:1, dev:sdf1 >> *** this week check start: >> [1530647.746383] md: data-check of RAID array md0 >> [1530647.759677] md: minimum _guaranteed_ speed: 24000 KB/sec/disk. >> [1530647.778041] md: using maximum available idle IO bandwidth (but >> not more than 200000 KB/sec) for data-check. >> [1530647.807663] md: using 128k window, over a total of 312568576 blocks. >> *** this week check end: >> [1545248.680745] md: md0: data-check done. >> [1545249.266727] RAID5 conf printout: >> [1545249.276930] --- rd:6 wd:6 >> [1545249.285542] disk 0, o:1, dev:sda1 >> [1545249.296228] disk 1, o:1, dev:sdb1 >> [1545249.306923] disk 2, o:1, dev:sdc1 >> [1545249.317613] disk 3, o:1, dev:sdd1 >> [1545249.328292] disk 4, o:1, dev:sde1 >> [1545249.338981] disk 5, o:1, dev:sdf1 >> >> -- >> Eyal Lebedinsky (eyal@xxxxxxxxxxxxxx) <http://samba.org/eyal/> >> attach .zip as .dat >> - >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- Eyal Lebedinsky (eyal@xxxxxxxxxxxxxx) <http://samba.org/eyal/> attach .zip as .dat - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html