This did work, but I agree, it is not the best approach. I was looking for a backstop in case I could not get the kernel change that was the real fix to the issue. I wanted to give a solution to our customers who would upgrade into the problem. Having multitpath-tools dealing with a fail path that isn't really a fail path is clumsy at best. The kernel fix got in here: https://git.kernel.org/mkp/scsi/c/6056a92ceb2a So this patch is no longer needed. Thanks, Brian On Fri, Apr 29, 2022 at 7:31 PM Benjamin Marzinski <bmarzins@xxxxxxxxxx> wrote: > > On Mon, Apr 18, 2022 at 02:54:15PM -0700, Brian Bunker wrote: > > With the kernel commit 268940b80fa4096397fd0a28e6ad807e64120215, the > > handling of ALUA state transitioning state changed. It used to be that > > paths which returned with ALUA transitioning state would not result in > > those paths being failed. With this patch, that state returned will > > result in the path being failed. The result of this will result in all > > paths down events when a configuration has the no_path_retry set to 0. > > Before this kernel change that same configuration would not reach all > > paths down. > > > > If the kernel is not changed back to the previous behavior, the > > multipath-tools have to protect against this condition or > > configurations which were working correctly prior to the kernel change > > will lead to an unexpected failure. > > Unless I'm missing something, this patch isn't going to be helpful. It > doesn't configure the multipath device to queue IOs before the failure. > When the kernel runs out of paths, it will see that the mutltipath > device has no paths and isn't set to queue, and it will fail the IOs. > After the paths come back up again, multipath will enable queueing in > update_queue_mode_add_path(). But it's too late by then. The IOs will > have already been failed. > > Another issue is that the patch is doing scsi specific work in code > meant for general block devices. Finally, It's seems pointless to set > the device to queue when all the paths go away and the turn off queueing > when the paths come back. no_path_retry doesn't do anything when there > are paths, so if it's necessary to set when the paths are gone, it won't > hurt anything to be set when the paths are present. I understand that > you only want to do this for devices in the ALUA transtitioning state, > but the only why to make this actually help is if multipathd enables > queueing before the device ever gets into that state. > > The only really safe thing to do would be to set no_path_retry to > something other than 0 or NO_PATH_RETRY_QUEUE. This can already be done > in the configuration files. But I agree that failing in IOs because the > device is transitioning seems wrong in the first place, and fixing this > in the kernel makes sense. > > > See the kernel discussions here: > > https://marc.info/?l=linux-scsi&m=162636826305740&w=2 > > https://marc.info/?l=linux-scsi&m=164987222322261&w=2 > > > > Signed-off-by: brian@xxxxxxxxxxxxxxx > > Reviewed-by: sconnor@xxxxxxxxxxxxxxx > > ___ > > diff --git a/libmultipath/structs.h b/libmultipath/structs.h > > index d94f93a0..0af7e598 100644 > > --- a/libmultipath/structs.h > > +++ b/libmultipath/structs.h > > @@ -370,6 +370,7 @@ struct multipath { > > int failback_tick; > > > > int no_path_retry; /* number of retries after all paths are down */ > > + int no_path_retry_configured; /* value in config */ > > int retry_tick; /* remaining times for retries */ > > int disable_queueing; > > int minio; > > diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c > > index 6c23df86..3db4e6f7 100644 > > --- a/libmultipath/structs_vec.c > > +++ b/libmultipath/structs_vec.c > > @@ -780,8 +780,15 @@ void update_queue_mode_add_path(struct multipath *mpp) > > { > > int active = count_active_paths(mpp); > > > > - if (active > 0) > > + if (active > 0) { > > leave_recovery_mode(mpp); > > + if (mpp->no_path_retry != mpp->no_path_retry_configured) { > > + condlog(2, "%s: return no path retry to %d > > from %d", mpp->alias, > > + mpp->no_path_retry_configured, mpp->no_path_retry); > > + mpp->no_path_retry = mpp->no_path_retry_configured; > > + } > > + } > > + > > condlog(2, "%s: remaining active paths: %d", mpp->alias, active); > > } > > > > diff --git a/multipathd/main.c b/multipathd/main.c > > index f2c0b280..92841d13 100644 > > --- a/multipathd/main.c > > +++ b/multipathd/main.c > > @@ -524,7 +524,7 @@ int update_multipath (struct vectors *vecs, char > > *mapname, int reset) > > struct multipath *mpp; > > struct pathgroup *pgp; > > struct path *pp; > > - int i, j; > > + int i, j, tpg; > > > > mpp = find_mp_by_alias(vecs->mpvec, mapname); > > > > @@ -553,6 +553,15 @@ int update_multipath (struct vectors *vecs, char > > *mapname, int reset) > > checkint = conf->checkint; > > put_multipath_config(conf); > > condlog(2, "%s: mark as failed", pp->dev); > > + tpg = get_target_port_group(pp, DEF_TIMEOUT); > > + if ((tpg >= 0) && > > + (mpp->no_path_retry == 0) && > > + (get_asymmetric_access_state(pp, > > tpg, DEF_TIMEOUT) == AAS_TRANSITIONING)) { > > + mpp->no_path_retry_configured > > = mpp->no_path_retry; > > + mpp->no_path_retry = (60 / checkint); > > + condlog(2, "%s: changed %s no > > path retry to %d", pp->dev, mpp->alias, > > + (60 / checkint)); > > + } > > mpp->stat_path_failures++; > > pp->state = PATH_DOWN; > > if (oldstate == PATH_UP || > > ___ > > > > -- > > Brian Bunker > > PURE Storage, Inc. > > brian@xxxxxxxxxxxxxxx > > > > -- > > dm-devel mailing list > > dm-devel@xxxxxxxxxx > > https://listman.redhat.com/mailman/listinfo/dm-devel > -- Brian Bunker PURE Storage, Inc. brian@xxxxxxxxxxxxxxx -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel