On Mon, 03 May 2010 12:04:38 +0200 MRK <mrk@xxxxxxxxxxxxx> wrote: > On 05/03/2010 04:17 AM, Neil Brown wrote: > > On Sat, 1 May 2010 23:44:04 +0200 > > "Janos Haar"<janos.haar@xxxxxxxxxxxx> wrote: > > > > > >> The general problem is, i have one single-degraded RAID6 + 2 badblock disk > >> inside wich have bads in different location. > >> The big question is how to keep the integrity or how to do the rebuild by 2 > >> step instead of one continous? > >> > > Once you have the fix that has already been discussed in this thread, the > > only other problem I can see with this situation is if attempts to write good > > data over the read-errors results in a write-error which causes the device to > > be evicted from the array. > > > > And I think you have reported getting write > > errors. > > > > His dmesg AFAIR has never reported any error of the kind "raid5:%s: read > error NOT corrected!! " (the error message you get on failed rewrite AFAIU) > Up to now (after my patch) he only tried with MD above DM-COW and DM was > dropping the drive on read error so I think MD didn't get any > opportunity to rewrite. Hmmm... fair enough. > > It is not clear to me what kind of error MD got from DM: > > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: Invalidating snapshot: Error reading/writing. > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, disabling device. > > I don't understand from what place the md_error() is called... I suspect it is from raid5_end_write_request. It looks like we don't print any message when the re-write fails. Only if the read after the rewrite fails. > but also in this case it doesn't look like a rewrite error... > ... so I suspect it is a rewrite error. Unless I missed something. What message did you expect to see in the case of a re-write error? > I think without DM COW it should probably work in his case. > > Your new patch skips the rewriting and keeps the unreadable sectors, > right? So that the drive isn't dropped on rewrite... Correct. > > > The following patch should address this issue for you. > > It is*not* a general-purpose fix, but a specific fix > [CUT] NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html