On Wed, Nov 08, 2023 at 03:36:14PM +0000, Martin Wilck wrote: > On Thu, 2023-11-02 at 18:15 -0400, Benjamin Marzinski wrote: > > This option lets multipath set a scsi disk's max_retries sysfs value. > > Setting this can be helpful for cases where the path checker > > succeeds, > > but IO commands hang and timeout. By default, the SCSI layer will > > retry > > IOs 5 times. Reducing this value will allow multipath to retry the IO > > down another path sooner. > > > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx> > > 2 nitpicks below. Please explain to me again why we recommend to > activate shaky paths detection with this. What will go wrong if the > user uses max_retries without shaky path detection? In the case were the path_checker keeps succeeding, but the IOs keep hanging, multipathd will just keep restoring this path over and over again. That's the sort of path ping-ponging that shaky path detection should be able to stop. I guess this is can speed up other cases for failover as well, so I can leave that off. Hopefully people know that if they are seeing ping-ponging, shaky path detection can help with it. > > > --- > > libmultipath/config.c | 3 +++ > > libmultipath/config.h | 3 +++ > > libmultipath/dict.c | 34 ++++++++++++++++++++++++++++ > > libmultipath/discovery.c | 42 > > ++++++++++++++++++++++++++++++++++- > > libmultipath/propsel.c | 18 +++++++++++++++ > > libmultipath/propsel.h | 1 + > > libmultipath/structs.h | 7 ++++++ > > multipath/multipath.conf.5.in | 22 ++++++++++++++++++ > > 8 files changed, 129 insertions(+), 1 deletion(-) > > > > diff --git a/libmultipath/dict.c b/libmultipath/dict.c > > index 044067af..fc438947 100644 > > --- a/libmultipath/dict.c > > +++ b/libmultipath/dict.c > > @@ -1152,6 +1152,36 @@ declare_hw_snprint(eh_deadline, > > print_undef_off_zero) > > declare_pc_handler(eh_deadline, set_undef_off_zero) > > declare_pc_snprint(eh_deadline, print_undef_off_zero) > > > > +static int > > +set_max_retries(vector strvec, void *ptr, const char *file, int > > line_nr) > > +{ > > + char * buff; > > + int *int_ptr = (int *)ptr; > > + > > + buff = set_value(strvec); > > + if (!buff) > > + return 1; > > + > > + if (strcmp(buff, "off") == 0) > > + *int_ptr = UOZ_OFF; > > + else if (strcmp(buff, "0") == 0) > > + *int_ptr = UOZ_ZERO; > > Why not use MAX_RETRIES_OFF and MAX_RETRIES_ZERO here? No good reason. I can change that. > > + else > > + do_set_int(strvec, int_ptr, 1, 5, file, line_nr, > > buff); > > + > > + free(buff); > > + return 0; > > +} > > + > > > diff --git a/multipath/multipath.conf.5.in > > b/multipath/multipath.conf.5.in > > index 226d0019..41f3927e 100644 > > --- a/multipath/multipath.conf.5.in > > +++ b/multipath/multipath.conf.5.in > > @@ -793,6 +793,22 @@ The default is: \fB<unset>\fR > > . > > . > > .TP > > +.B max_retries > > +Specify the maximum number of times the SCSI layer will retry IO > > commands > > +before returning failure. Setting this can be helpful for cases > > where the path > > +checker succeeds, but IO commands hang and timeout. By default, the > > SCSI layer > > +will retry IOs 5 times. Reducing this value will allow multipath to > > retry the IO > > +down another path sooner. \fBNote:\fR If it is necessary to set this > > Maybe you should add "[retry IO commands] for some types of SCSI > errors". Otherwise users might get their hopes too high. IIRC the main > cases that matter here are DID_TRANSPORT_DISRUPTED and perhaps > DID_BUS_BUSY (don't put that in the man page) ;-) Sure. -Ben > > > > value, it > > +is also recommended to set up shaky paths detection. See "Shaky > > paths detection" > > +below. Valid values are > > +\fB0\fR through \fB5\fR. > > +.RS > > +.TP > > +The default is: \fB<unset>\fR > > +.RE > >