On Tue, 2024-01-30 at 11:41 -0500, Benjamin Marzinski wrote: > On Tue, Jan 23, 2024 at 06:00:07PM -0800, Brian Bunker wrote: > > When a path is lost and then reinstated later, the ALUA > > device handler will not pick up this change and continue > > to possibly provide incorrect (stale) information about > > its ALUA state to the 'sysfs' prioritizer if the path's > > priorities were changed prior to the loss of those paths. > > > > On the loss of an I_T nexus, the path state should not > > continue to use its last known state. Many things and a lot > > of time could have passed between the path loss and its > > eventual reinstatemnt. > > > > The ALUA device handler didn't have this issue since it > > always got the ALUA state from the target by sending an RTPG > > request on the path and updated the priority. However with > > the detect priority set to true by default, the 'sysfs' > > prioritzier will be used on targets that support ALUA rather > > than the 'alua' prioritizer. > > > > Without re-evaluating the ALUA state when a path returns > > multipath is left with path states which do not reflect > > the actual ALUA states the target is providing. > > > > 3624a9370f0f545fc7c3e46a100011010 dm-2 PURE,FlashArray > > size=3.0T features='0' hwhandler='1 alua' wp=rw > > > -+- policy='service-time 0' prio=50 status=active > > > > - 13:0:0:1 sdg 8:96 active ready running > > > > - 12:0:0:1 sdf 8:80 active ready running > > > > - 9:0:0:1 sdc 8:32 active ready running > > > `- 1:0:0:1 sdb 8:16 active ready running > > `-+- policy='service-time 0' prio=10 status=enabled > > |- 14:0:0:1 sdh 8:112 active ready running > > |- 15:0:0:1 sdi 8:128 active ready running > > |- 10:0:0:1 sdd 8:48 active ready running > > `- 11:0:0:1 sde 8:64 active ready running > > > > # sg_rtpg /dev/sdh (Active/Optimized) > > target port group asymmetric access state : 0x00 > > > > Signed-off-by: Brian Bunker <brian@xxxxxxxxxxxxxxx> > > Signed-off-by: Seamus Connor <sconnor@xxxxxxxxxxxxxxx> > > --- > > multipathd/main.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/multipathd/main.c b/multipathd/main.c > > index 230c9d10..dd48be74 100644 > > --- a/multipathd/main.c > > +++ b/multipathd/main.c > > @@ -1937,6 +1937,11 @@ reinstate_path (struct path * pp) > > else { > > condlog(2, "%s: reinstated", pp->dev_t); > > update_queue_mode_add_path(pp->mpp); > > + if (strcmp(pp->prio.name, PRIO_SYSFS) == 0) { > > + condlog(2, "%s: rescan target to update > > priorities", > > + pp->dev_t); > > + rescan_path(pp->udev); > > I can see why this is necessary with the sysfs target, but AFAICT > rescanning the scsi device will end up calling scsi_execute_cmd to > update the vpd pages, and this can block. Obviously, we are doing > this > after the checker has verified that the path is up, but even still, > we're trying to cut down on the amount of code that can block > multipathd > on an inaccessible device, instead of increase it. > > Perhaps it would be better to make the prioritizer itself smarter, so > that it could know when it needs to run the rescan(), and it could do > that in a separate thread. > > Thoughts? A full rescan shouldn't be necessary. All that's needed is that the kernel issue another RTPG. AFAICS that should happen as soon as the target responds to any command with a sense key of UNIT ATTENTION with ASC=0x2a and ascq=6 or 7 (ALUA state change, ALUA state transition failed). @Brian, does your storage unit not do this? If so, I suggest we disable the sysfs prioritizer for pure storage. Otherwise, as far as multipathd is concerned, when a path is reinstated, it should be sufficient to send any IO command to trigger an RTPG. Or am I missing something here? Martin