Re: md: raid5 resync corrects read errors on data block - is this correct?

Alexander Lyakas <alex.bolshoy@xxxxxxxxx> · Mon, 17 Sep 2012 14:15:16 +0300

Hi Neil,
below is a bit less-ugly version of the patch.
Thanks,
Alex.

>From 05cf800d623bf558c99d542cf8bf083c85b7e5d5 Mon Sep 17 00:00:00 2001
From: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx>
Date: Thu, 13 Sep 2012 18:55:00 +0300
Subject: [PATCH] When RAID5 is dirty, force reconstruct-write instead of
 read-modify-write.

Signed-off-by: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx>
Signed-off-by: Yair Hershko <yair@xxxxxxxxxxxxxxxxx>

diff --git a/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
b/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
index 5332202..0702785 100644
--- a/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
+++ b/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
@@ -2555,12 +2555,36 @@ static void handle_stripe_dirtying(struct r5conf *conf,
                                   int disks)
 {
        int rmw = 0, rcw = 0, i;
-       if (conf->max_degraded == 2) {
-               /* RAID6 requires 'rcw' in current implementation
-                * Calculate the real rcw later - for now fake it
+       sector_t recovery_cp = conf->mddev->recovery_cp;
+       unsigned long recovery = conf->mddev->recovery;
+       int needed = test_bit(MD_RECOVERY_NEEDED, &recovery);
+       int resyncing = test_bit(MD_RECOVERY_SYNC, &recovery) &&
+                       !test_bit(MD_RECOVERY_REQUESTED, &recovery) &&
+                       !test_bit(MD_RECOVERY_CHECK, &recovery);
+       int transitional = test_bit(MD_RECOVERY_RUNNING, &recovery) &&
+                          !test_bit(MD_RECOVERY_SYNC, &recovery) &&
+                          !test_bit(MD_RECOVERY_RECOVER, &recovery) &&
+                          !test_bit(MD_RECOVERY_DONE, &recovery) &&
+                          !test_bit(MD_RECOVERY_RESHAPE, &recovery);
+
+       /* RAID6 requires 'rcw' in current implementation.
+        * Otherwise, attempt to check whether resync is now happening
+        * or should start.
+         * If yes, then the array is dirty (after unclean shutdown or
+         * initial creation), so parity in some stripes might be inconsistent.
+         * In this case, we need to always do reconstruct-write, to ensure
+         * that in case of drive failure or read-error correction, we
+         * generate correct data from the parity.
+         */
+       if (conf->max_degraded == 2 ||
+           (recovery_cp < MaxSector && sh->sector >= recovery_cp &&
+            (needed || resyncing || transitional))) {
+               /* Calculate the real rcw later - for now fake it
                 * look like rcw is cheaper
                 */
                rcw = 1; rmw = 2;
+               pr_debug("force RCW max_degraded=%u, recovery_cp=%lu
sh->sector=%lu recovery=0x%lx\n",
+                        conf->max_degraded, recovery_cp, sh->sector, recovery);
        } else for (i = disks; i--; ) {
                /* would I have to read this buffer for read_modify_write */
                struct r5dev *dev = &sh->dev[i];
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html