Re: [PATCH] Don't jump to compute_result state from check_result state

Song Liu <liu.song.a23@xxxxxxxxx> · Thu, 28 Mar 2019 09:56:37 -0700

On Thu, Mar 28, 2019 at 5:16 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
>
> On 3/26/19 7:33 PM, Song Liu wrote:
>
> On Tue, Mar 26, 2019 at 9:08 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
>
> Changing state from check_state_check_result to
> check_state_compute_result not only is unsafe but also doesn't
> appear to serve a valid purpose.  A raid6 check should only be
> pushing out extra writes if doing repair and a mis-match occurs.
> The stripe dev management will already try and do repair writes
> for failing sectors.
>
> Could you please explain more details about the failed case before
> this patch?
>
> Thanks,
> Song
>
> Reproduced the issue with a modified scsi_debug module.  Redefined OPT_MEDIUM_ERR_ADDR to 12000 to move the I/O error where I wanted in on the disk.  I loaded the modified scsi_debug with parameter dev_size_mb=250 to make it bigger and partitioned it into partitions to make a test raid6 device out of.
>
>
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sde1            2048       94207       46080   83  Linux
> /dev/sde2           94208      186367       46080   83  Linux
> /dev/sde3          186368      278527       46080   83  Linux
> /dev/sde4          278528      511999      116736    5  Extended
> /dev/sde5          280576      372735       46080   83  Linux
> /dev/sde6          374784      466943       46080   83  Linux
>
> The raid6 device was created with:
>
> mdadm --create /dev/md127 --level=6 --raid-devices=5 /dev/sde1 /dev/sde2 /dev/sde3 /dev/sde5 /dev/sde6
>
> After initially running, a check can be triggered and it work without issue.  The medium error feature of scsi_debug can then be turned on with:
>
> echo "2" >/sys/module/scsi_debug/parameters/opts
>
>
> The sector 12000, which is part of sde1, will now fail to read and contains the q parity information for its stripe of the raid6 array.  Running a new check operation crashes the system at the same BUG_ON as seen by the customer.
>
> This patch makes the raid6 check_state_check_result handling
> work more like raid5's.  If somehow too many failures for a
> check, just quit the check operation for the stripe.  When any
> checks pass, don't try and use check_state_compute_result for
> a purpose it isn't needed for and is unsafe for.  Just mark the
> stripe as in sync for passing its parity checks and let the
> stripe dev read/write code and the bad blocks list do their
> job handling I/O errors.

+ Dan Williams. But I am not sure whether Dan still remembers this
work. :)

Dan, please feel free to ignore this. :)

Thanks Nigel and Xiao for these details. I will process the patch.

Regards,
Song

>
> OriginalAuthor: David Jeffery <djeffery@xxxxxxxxxx>
> Tested-by: David Jeffery <djeffery@xxxxxxxxxx>
> Signed-off-by: David Jeffy <djeffery@xxxxxxxxxx>
> Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx>
> ---
>  drivers/md/raid5.c | 19 ++++---------------
>  1 file changed, 4 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index c033bfcb209e..4204a972ecf2 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -4223,26 +4223,15 @@ static void handle_parity_checks6(struct r5conf *conf, struct stripe_head *sh,
>         case check_state_check_result:
>                 sh->check_state = check_state_idle;
>
> +               if (s->failed > 1)
> +                   break;
>                 /* handle a successful check operation, if parity is correct
>                  * we are done.  Otherwise update the mismatch count and repair
>                  * parity if !MD_RECOVERY_CHECK
>                  */
>                 if (sh->ops.zero_sum_result == 0) {
> -                       /* both parities are correct */
> -                       if (!s->failed)
> -                               set_bit(STRIPE_INSYNC, &sh->state);
> -                       else {
> -                               /* in contrast to the raid5 case we can validate
> -                                * parity, but still have a failure to write
> -                                * back
> -                                */
> -                               sh->check_state = check_state_compute_result;
> -                               /* Returning at this point means that we may go
> -                                * off and bring p and/or q uptodate again so
> -                                * we make sure to check zero_sum_result again
> -                                * to verify if p or q need writeback
> -                                */
> -                       }
> +                       /* Any parity checked was correct */
> +                       set_bit(STRIPE_INSYNC, &sh->state);
>                 } else {
>                         atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
>                         if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) {
> --
> 2.20.1
>