Re: [PATCH md 014 of 18] Attempt to auto-correct read errors in raid1.

Paul Clements <paul.clements@xxxxxxxxxxxx> · Tue, 29 Nov 2005 11:38:51 -0500

Hi Neil,

Glad to see this patch is making its way to mainline. I have a couple of 
questions on the patch, though...

NeilBrown wrote:

+	if (uptodate || conf->working_disks <= 1) {

Is it valid to mask a read error just because we have only 1 working disk?

+				do {
+					rdev = conf->mirrors[d].rdev;
+					if (rdev &&
+					    test_bit(In_sync, &rdev->flags) &&
+					    sync_page_io(rdev->bdev,
+							 sect + rdev->data_offset,
+							 s<<9,
+							 conf->tmppage, READ))
+						success = 1;
+					else {
+						d++;
+						if (d == conf->raid_disks)
+							d = 0;
+					}
+				} while (!success && d != r1_bio->read_disk);
+
+				if (success) {
+					/* write it back and re-read */
+					while (d != r1_bio->read_disk) {

Here, it looks like if we retry the read on the same disk that just gave 
the read error, then we will not do any re-writes? I assume that is 
intentional? I guess it's a judgment call whether the sector is really 
bad at that point.

+						if (d==0)
+							d = conf->raid_disks;
+						d--;
+						rdev = conf->mirrors[d].rdev;
+						if (rdev &&
+						    test_bit(In_sync, &rdev->flags)) {
+							if (sync_page_io(rdev->bdev,
+									 sect + rdev->data_offset,
+									 s<<9, conf->tmppage, WRITE) == 0 ||
+							    sync_page_io(rdev->bdev,
+									 sect + rdev->data_offset,
+									 s<<9, conf->tmppage, READ) == 0) {
+								/* Well, this device is dead */
+								md_error(mddev, rdev);

Here, we might have gotten garbage back from the sync_page_io(..., 
READ), if it failed. So don't we have to quit the re-write loop at this 
point? Otherwise, aren't we potentially writing bad data over other 
disks? Granted, this particular case might never actually happen in the 
real world.

Thanks,
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html