Re: [PATCH] Don't jump to compute_result state from check_result state

Nigel Croxon <ncroxon@xxxxxxxxxx> · Tue, 16 Apr 2019 16:24:10 -0400

On 4/16/19 1:40 PM, Song Liu wrote:
On Mon, Apr 15, 2019 at 1:53 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:


On 04/15/2019 03:24 PM, Song Liu wrote:
On Mon, Apr 15, 2019 at 10:54 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:

On 04/15/2019 01:44 PM, Song Liu wrote:
Hi Nigel,

On Thu, Apr 11, 2019 at 10:21 AM Song Liu <liu.song.a23@xxxxxxxxx> wrote:
On Tue, Apr 9, 2019 at 12:26 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
On 4/9/19 2:09 PM, John Stoffel wrote:
"Dan" == Dan Williams <dan.j.williams@xxxxxxxxx> writes:
Dan> On Mon, Apr 8, 2019 at 4:18 PM Song Liu <liu.song.a23@xxxxxxxxx> wrote:
Dan> [..]
To trigger this issue, you not only need a failed disk but to also
perform a scrubbing operation.  The customer's systems both crashed
early Sunday morning when the raid-check script is run by default from cron.
Ok, I follow this, but I come to a different answer on the required
fix. I think it is simply the following to preserve the writeback
action after the parity check, because we need the failed Q slot to be
written back if we're recovering. P will be not up-to-date because it
was checked with the good disks, but sh->ops.zero_sum_result will be
0, so that will skip the writeback of a !uptodate P value.

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c033bfcb209e..e2eb59289346 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4187,7 +4187,6 @@ static void handle_parity_checks6(struct r5conf
*conf, struct stripe_head *sh,
                    /* now write out any block on a failed drive,
                     * or P or Q if they were recomputed
                     */
-               BUG_ON(s->uptodate < disks - 1); /* We don't need Q to
recover */
Thanks Dan!

Would it make sense to rework the check as

BUG_ON(s->uptodate < disks - 2);
Dan> I think the problem is that any 'uptodate' vs 'disks' check is
Dan> not precise enough in this path. What might be better is to put
Dan> "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the devices that
Dan> might try to kick off writes and then skip the action. Better to
Dan> prevent the raid driver from taking unexpected action *and* keep
Dan> the system alive vs killing the machine with BUG_ON.

Dan> BUG_ON has fallen out of favor for exception reporting since
Dan> those assertions were introduced.

And since it' causes the system to crash... it's super annoying when
the rest of the system is working fine.  Please only use a WARN_ON,
and maybe even set the RAID volume readonly, etc.  But don't bring
down the rest of the system if possible.

John
I reverted the first patch as it made its way upstream.

Testing this change now.

---

     drivers/md/raid5.c | 2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c033bfcb209e..660ca3af2914 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf
*conf, struct stripe_head *sh,
             /* now write out any block on a failed drive,
              * or P or Q if they were recomputed
              */
-        BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */
+        WARN_ON(s->uptodate < disks - 2); /* We don't need Q to recover */
I think this WARN_ON() is the best way to go, though the comment "don't need Q"
needs some revise.

Nigel, how does this work in your tests?

Thanks,
Song
Could you please share updates about the tests?

Thanks,
Song
Hello Song,
Our customer has confirmed, the issue is fixed. He had run this on two
systems with defective Disks and run raid-check. There were some errors
corrected, the device was degraded, but the kernel never crashed.

-Nigel
Thanks Nigel!

Could you please submit official patch?

Song

             if (s->failed == 2) {
                 dev = &sh->dev[s->failed_num[1]];
                 s->locked++;
--
2.20.1

Song,   You will need to revert 4f4fd7c5798 before applying this:
-Nigel

From: Nigel Croxon <ncroxon@xxxxxxxxxx>
Date: Wed, 10 Apr 2019 09:23:34 -0400
Subject: [PATCH ] preserve the writeback action after the parity check

The problem is that any 'uptodate' vs 'disks' check is not precise
in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the
device that might try to kick off writes and then skip the action.
Better to prevent the raid driver from taking unexpected action *and* keep
the system alive vs killing the machine with BUG_ON.

Fixes: 4f4fd7c5798 Don't jump to compute_result state from check_result
state

Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx>
---
   drivers/md/raid5.c | 10 +++++++++-
   1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f957b7e42562..4f6b32366c00 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf
*conf, struct stripe_head *sh,
           /* now write out any block on a failed drive,
            * or P or Q if they were recomputed
            */
-        BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */
+        dev = NULL;
           if (s->failed == 2) {
               dev = &sh->dev[s->failed_num[1]];
               s->locked++;
@@ -4212,6 +4212,14 @@ static void handle_parity_checks6(struct r5conf
*conf, struct stripe_head *sh,
               set_bit(R5_LOCKED, &dev->flags);
               set_bit(R5_Wantwrite, &dev->flags);
           }
+        if (WARN_ONCE(dev && !test_bit(R5_UPTODATE, &dev->flags),
+                    "%s: disk%ld not up to date\n",
+                    mdname(conf->mddev),
+                    dev - (struct r5dev *) &sh->dev)) {
+            clear_bit(R5_LOCKED, &dev->flags);
+            clear_bit(R5_Wantwrite, &dev->flags);
+            s->locked--;
+        }
           clear_bit(STRIPE_DEGRADED, &sh->state);

           set_bit(STRIPE_INSYNC, &sh->state);
--
2.20.1


Thanks Nigel!

Pushed to https://github.com/liu-song-6/linux/tree/md-next .

Song

From: Nigel Croxon <ncroxon@xxxxxxxxxx>
Date: Tue, 16 Apr 2019 15:37:30 -0400
Subject: [PATCH] preserve the writeback action after the parity check


The problem is that any 'uptodate' vs 'disks' check is not precise
in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the
device that might try to kick off writes and then skip the action.
Better to prevent the raid driver from taking unexpected action *and* keep
the system alive vs killing the machine with BUG_ON.

V2: change format '%ld' argument of type 'long int', to 'int'.

Fixes: 4f4fd7c5798 Don't jump to compute_result state from check_result 
state

Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx>
---
 drivers/md/raid5.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f957b7e42562..395e7ec0fb6e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf 
*conf, struct stripe_head *sh,
         /* now write out any block on a failed drive,
          * or P or Q if they were recomputed
          */
-        BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */
+        dev = NULL;
         if (s->failed == 2) {
             dev = &sh->dev[s->failed_num[1]];
             s->locked++;
@@ -4212,6 +4212,14 @@ static void handle_parity_checks6(struct r5conf 
*conf, struct stripe_head *sh,
             set_bit(R5_LOCKED, &dev->flags);
             set_bit(R5_Wantwrite, &dev->flags);
         }
+        if (WARN_ONCE(dev && !test_bit(R5_UPTODATE, &dev->flags),
+            "%s: disk%lx not up to date\n",
+            mdname(conf->mddev),
+            dev - (struct r5dev *) &sh->dev)) {
+            clear_bit(R5_LOCKED, &dev->flags);
+            clear_bit(R5_Wantwrite, &dev->flags);
+            s->locked--;
+        }
         clear_bit(STRIPE_DEGRADED, &sh->state);

         set_bit(STRIPE_INSYNC, &sh->state);
--
2.20.1