Re: [PATCH] raid456: avoid second retry of read-error

Song Liu <liu.song.a23@xxxxxxxxx> · Tue, 5 Nov 2019 13:14:30 -0800

On Mon, Nov 4, 2019 at 12:02 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
>
> The MD driver for level-456 should prevent re-reading read errors.
>
> For redundant raid it makes no sense to retry the operation:
> When one of the disks in the array hits a read error, that will
> cause a stall for the reading process:
> - either the read succeeds (e.g. after 4 seconds the HDD error
> strategy could read the sector)
> - or it fails after HDD imposed timeout (w/TLER, e.g. after 7
> seconds (might be even longer)
>
> The user can enable/disable this functionality by the following
> commands:
> To Enable:
> echo 1 > /proc/sys/dev/raid/raid456_retry_read_error
>
> To Disable, type the following at anytime:
> echo 0 > /proc/sys/dev/raid/raid456_retry_read_error
>
> Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx>
> ---
>  drivers/md/md.c    | 43 +++++++++++++++++++++++++++++++++++++++++++
>  drivers/md/md.h    |  3 +++
>  drivers/md/raid5.c |  3 ++-
>  3 files changed, 48 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 6f0ecfe8eab2..75b8b0615328 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -125,6 +125,12 @@ static inline int speed_max(struct mddev *mddev)
>                 mddev->sync_speed_max : sysctl_speed_limit_max;
>  }
>
> +static int sysctl_raid456_retry_read_error = 0;
> +static inline void set_raid456_retry_re(struct mddev *mddev, int re)
> +{
> +       (re ? set_bit : clear_bit)(MD_RAID456_RETRY_RE, &mddev->flags);

Let's just keep this
if (re)
     set_bit(...);
else
     clear_bit(..);

> +}
> +
>  static int rdev_init_wb(struct md_rdev *rdev)
>  {
>         if (rdev->bdev->bd_queue->nr_hw_queues == 1)
> @@ -213,6 +219,13 @@ static struct ctl_table raid_table[] = {
>                 .mode           = S_IRUGO|S_IWUSR,
>                 .proc_handler   = proc_dointvec,
>         },
> +       {
> +               .procname       = "raid456_retry_read_error",
> +               .data           = &sysctl_raid456_retry_read_error,
> +               .maxlen         = sizeof(int),
> +               .mode           = S_IRUGO|S_IWUSR,
> +               .proc_handler   = proc_dointvec,
> +       },
>         { }
>  };

How about we add this to md_redundancy_attrs instead?

Thanks,
Song