On Thu, 2023-11-02 at 18:15 -0400, Benjamin Marzinski wrote: > This option lets multipath set a scsi disk's max_retries sysfs value. > Setting this can be helpful for cases where the path checker > succeeds, > but IO commands hang and timeout. By default, the SCSI layer will > retry > IOs 5 times. Reducing this value will allow multipath to retry the IO > down another path sooner. > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx> 2 nitpicks below. Please explain to me again why we recommend to activate shaky paths detection with this. What will go wrong if the user uses max_retries without shaky path detection? > --- > libmultipath/config.c | 3 +++ > libmultipath/config.h | 3 +++ > libmultipath/dict.c | 34 ++++++++++++++++++++++++++++ > libmultipath/discovery.c | 42 > ++++++++++++++++++++++++++++++++++- > libmultipath/propsel.c | 18 +++++++++++++++ > libmultipath/propsel.h | 1 + > libmultipath/structs.h | 7 ++++++ > multipath/multipath.conf.5.in | 22 ++++++++++++++++++ > 8 files changed, 129 insertions(+), 1 deletion(-) > > diff --git a/libmultipath/dict.c b/libmultipath/dict.c > index 044067af..fc438947 100644 > --- a/libmultipath/dict.c > +++ b/libmultipath/dict.c > @@ -1152,6 +1152,36 @@ declare_hw_snprint(eh_deadline, > print_undef_off_zero) > declare_pc_handler(eh_deadline, set_undef_off_zero) > declare_pc_snprint(eh_deadline, print_undef_off_zero) > > +static int > +set_max_retries(vector strvec, void *ptr, const char *file, int > line_nr) > +{ > + char * buff; > + int *int_ptr = (int *)ptr; > + > + buff = set_value(strvec); > + if (!buff) > + return 1; > + > + if (strcmp(buff, "off") == 0) > + *int_ptr = UOZ_OFF; > + else if (strcmp(buff, "0") == 0) > + *int_ptr = UOZ_ZERO; Why not use MAX_RETRIES_OFF and MAX_RETRIES_ZERO here? > + else > + do_set_int(strvec, int_ptr, 1, 5, file, line_nr, > buff); > + > + free(buff); > + return 0; > +} > + > diff --git a/multipath/multipath.conf.5.in > b/multipath/multipath.conf.5.in > index 226d0019..41f3927e 100644 > --- a/multipath/multipath.conf.5.in > +++ b/multipath/multipath.conf.5.in > @@ -793,6 +793,22 @@ The default is: \fB<unset>\fR > . > . > .TP > +.B max_retries > +Specify the maximum number of times the SCSI layer will retry IO > commands > +before returning failure. Setting this can be helpful for cases > where the path > +checker succeeds, but IO commands hang and timeout. By default, the > SCSI layer > +will retry IOs 5 times. Reducing this value will allow multipath to > retry the IO > +down another path sooner. \fBNote:\fR If it is necessary to set this Maybe you should add "[retry IO commands] for some types of SCSI errors". Otherwise users might get their hopes too high. IIRC the main cases that matter here are DID_TRANSPORT_DISRUPTED and perhaps DID_BUS_BUSY (don't put that in the man page) ;-) > value, it > +is also recommended to set up shaky paths detection. See "Shaky > paths detection" > +below. Valid values are > +\fB0\fR through \fB5\fR. > +.RS > +.TP > +The default is: \fB<unset>\fR > +.RE