On Mi, 2021-07-14 at 11:13 -0700, Brian Bunker wrote: > On Wed, Jul 14, 2021 at 1:39 AM Martin Wilck <mwilck@xxxxxxxx> wrote: > > > > > When a command fails with ALUA TRANSITIONING status, we must make > > sure > > that: > > > > 1) The command itself is not retried on the path at hand, neither > > on > > the SCSI layer nor on the blk-mq layer. The former was the case > > anyway, > > the latter is guaranteed by 0d88232010d5 ("scsi: core: Return > > BLK_STS_AGAIN for ALUA transitioning"). > > > > 2) No new commands are sent down this path until it reaches a > > usable > > final state. This is achieved on the SCSI layer by alua_prep_fn(), > > with > > 268940b80fa4 ("scsi: scsi_dh_alua: Return BLK_STS_AGAIN for ALUA > > transitioning state"). > > > > These two items would still be true with your patch applied. > > However, > > the problem is that if the path isn't failed, dm-multipath would > > continue sending I/O down this path. If the path isn't set to > > failed > > state, the path selector algorithm may or may not choose a > > different > > path next time. In the worst case, dm-multipath would busy-loop > > retrying the I/O on the same path. Please consider the following: > > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > > index 86518aec32b4..3f3a89fc2b3b 100644 > > --- a/drivers/md/dm-mpath.c > > +++ b/drivers/md/dm-mpath.c > > @@ -1654,12 +1654,12 @@ static int multipath_end_io(struct > > dm_target *ti, struct request *clone, > > if (error && blk_path_error(error)) { > > struct multipath *m = ti->private; > > > > - if (error == BLK_STS_RESOURCE) > > + if (error == BLK_STS_RESOURCE || error == > > BLK_STS_AGAIN) > > r = DM_ENDIO_DELAY_REQUEUE; > > else > > r = DM_ENDIO_REQUEUE; > > > > - if (pgpath) > > + if (pgpath && (error != BLK_STS_AGAIN)) > > fail_path(pgpath); > > > > This way we'd avoid busy-looping by delaying the retry. This would > > cause I/O delay in the case where some healthy paths are still in > > the > > same dm-multipath priority group as the transitioning path. I > > suppose > > this is a minor problem, because in the default case for ALUA > > (group_by_prio in multipathd), all paths in the PG would have > > switched > > to "transitioning" simultaneously. > > > > Note that multipathd assigns priority 0 (the same prio as > > "unavailable") if it happens to notice a "transitioning" path. This > > is > > something we may want to change eventually. In your specific case, > > it > > would cause the paths to be temporarily re-grouped, all paths being > > moved to the same non-functional PG. The way you describe it, for > > your > > storage at least, "transitioning" should be assigned a higher > > priority. > > > > > > > Yes. I tried it with this change and it works. Should I re-submit > this > modified version or do you want to do it? Yes, please. Regards, Martin