On 09/01/17 08:06, Wols Lists wrote:
On 08/01/17 20:46, Piergiorgio Sartor wrote:
[trim]
If one of the parity sectors is corrupted, it's easy. Calculate parity from the data, and either P or Q will be wrong, so fix it. But if it's a *data* sector that's corrupted, both P and Q will be wrong. How easy is it to work back from that, and work out *which* data sector is wrong? My fu makes me think you can't, though I could quite easily be wrong :-)
My understanding of RAID6 is that you CAN say which of the data/P/Q is wrong if one assumes only one is wrong. Is this not what raid6check claims to do? "In case of parity mismatches, "raid6check" reports, if possible, "which component drive could be responsible"
But should that even happen, unless a disk is on its way out, anyway?
Not so. I get, from time to time, non zero mismatch where I saw no disk errors of any sort in kernel messages or in smart status.
I remember years ago, back in the 80s, our minicomputers had error-correction in the drive. I don't remember the algorithm, but it wrote 16-bit words to disk - each an 8-bit data byte. The first half was the original data, and the second half was some parity pattern such that for any single-bit corruption you knew which half was corrupt, and you could throw away the corrupt parity, or recreate the correct data from the parity. Even with a 2-bit error I think it was >90% detection and recreation. I can't imagine something like that not being in drive hardware today.
The disk thinks it has good data but md thinks not. Maybe bad data was written due to some other bug? A corner case when the system rebooted unexpectedly? Maybe the controller corrupted the data?
Cheers, Wol
-- Eyal Lebedinsky (eyal@xxxxxxxxxxxxxx) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html