Re: [PATCH] Don't jump to compute_result state from check_result state

Nigel Croxon <ncroxon@xxxxxxxxxx> · Wed, 10 Apr 2019 13:05:29 -0400

On 4/9/19 7:41 PM, Dan Williams wrote:
On Tue, Apr 9, 2019 at 12:26 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:

On 4/9/19 2:09 PM, John Stoffel wrote:
"Dan" == Dan Williams <dan.j.williams@xxxxxxxxx> writes:
Dan> On Mon, Apr 8, 2019 at 4:18 PM Song Liu <liu.song.a23@xxxxxxxxx> wrote:
Dan> [..]
To trigger this issue, you not only need a failed disk but to also
perform a scrubbing operation.  The customer's systems both crashed
early Sunday morning when the raid-check script is run by default from cron.
Ok, I follow this, but I come to a different answer on the required
fix. I think it is simply the following to preserve the writeback
action after the parity check, because we need the failed Q slot to be
written back if we're recovering. P will be not up-to-date because it
was checked with the good disks, but sh->ops.zero_sum_result will be
0, so that will skip the writeback of a !uptodate P value.

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c033bfcb209e..e2eb59289346 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4187,7 +4187,6 @@ static void handle_parity_checks6(struct r5conf
*conf, struct stripe_head *sh,
                  /* now write out any block on a failed drive,
                   * or P or Q if they were recomputed
                   */
-               BUG_ON(s->uptodate < disks - 1); /* We don't need Q to
recover */
Thanks Dan!

Would it make sense to rework the check as

BUG_ON(s->uptodate < disks - 2);
Dan> I think the problem is that any 'uptodate' vs 'disks' check is
Dan> not precise enough in this path. What might be better is to put
Dan> "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the devices that
Dan> might try to kick off writes and then skip the action. Better to
Dan> prevent the raid driver from taking unexpected action *and* keep
Dan> the system alive vs killing the machine with BUG_ON.

Dan> BUG_ON has fallen out of favor for exception reporting since
Dan> those assertions were introduced.

And since it' causes the system to crash... it's super annoying when
the rest of the system is working fine.  Please only use a WARN_ON,
and maybe even set the RAID volume readonly, etc.  But don't bring
down the rest of the system if possible.

John
I reverted the first patch as it made its way upstream.

Testing this change now.

---

   drivers/md/raid5.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c033bfcb209e..660ca3af2914 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf
*conf, struct stripe_head *sh,
           /* now write out any block on a failed drive,
            * or P or Q if they were recomputed
            */
-        BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */
+        WARN_ON(s->uptodate < disks - 2); /* We don't need Q to recover */
If we're going to warn at all and keep the driver running I'd prefer
that the warning include information that would allow effective debug,
and take steps to minimize the damage. Something like this (please
forgive the whitespace damage). Note the usage of WARN_ONCE on the
thought that if this triggers it could get noisy and the log printing
could make the system unusable.

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c033bfcb209e..5b27b75b51b1 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf
*conf, struct stripe_head *sh,
                 /* now write out any block on a failed drive,
                  * or P or Q if they were recomputed
                  */
-               BUG_ON(s->uptodate < disks - 1); /* We don't need Q to
recover */
+               dev = NULL;
                 if (s->failed == 2) {
                         dev = &sh->dev[s->failed_num[1]];
                         s->locked++;
@@ -4212,6 +4212,14 @@ static void handle_parity_checks6(struct r5conf
*conf, struct stripe_head *sh,
                         set_bit(R5_LOCKED, &dev->flags);
                         set_bit(R5_Wantwrite, &dev->flags);
                 }
+               if (WARN_ONCE(dev && !test_bit(R5_UPTODATE, &dev->flags),
+                                       "%s: disk%ld not up to date\n",
+                                       mdname(conf->mddev),
+                                       dev - (struct r5dev *) &sh->dev)) {
+                       clear_bit(R5_LOCKED, &dev->flags);
+                       clear_bit(R5_Wantwrite, &dev->flags);
+                       s->locked--;
+               }
                 clear_bit(STRIPE_DEGRADED, &sh->state);

                 set_bit(STRIPE_INSYNC, &sh->state);



           if (s->failed == 2) {
               dev = &sh->dev[s->failed_num[1]];
               s->locked++;
--
2.20.1

As implemented, handle_parity_checks6() makes no guarantees that a 
failed device will have "uptodate" values when jumping from 
checks_result straight to compute_result state.