md: raid5 resync corrects read errors on data block - is this correct?

Alexander Lyakas <alex.bolshoy@xxxxxxxxx> · Tue, 11 Sep 2012 22:10:01 +0300

Hi Neil,
I have been doing some investigation about resync on raid5, and I
induced a repairable read error on a drive that contains a data block
for a particular stripe_head. I saw that resync corrects the read
error by recomputing the missing data block, rewriting it (twice) and
then re-reading back.

Then I dug into code and eventually saw that fetch_block():

if ((s->uptodate == disks - 1) &&
    (s->failed && (disk_idx == s->failed_num[0] ||
		   disk_idx == s->failed_num[1]))) {
	/* have disk failed, and we're requested to fetch it;
	 * do compute it
	 */

doesn't care whether disk_idx holds data or parity for this
stripe_head. It only checks that stripe_head has enough redundancy to
recompute the block.

My question: is this behavior correct? I mean that if we are doing
resync, it means that we are not sure that parity is correct (e.g.,
after an unclean shutdown). But we still use this possibly-incorrect
parity to recompute the missing data block. So is this a potential
bug?

Thanks,
Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html