Re: [PATCH] multipathd: the sysfs prioritizer can return stale data

Martin Wilck <mwilck@xxxxxxxx> · Wed, 31 Jan 2024 11:19:41 +0100

Hi Brian,

On Tue, 2024-01-30 at 10:43 -0800, Brian Bunker wrote:
> > 
> > A full rescan shouldn't be necessary. All that's needed is that the
> > kernel issue another RTPG. AFAICS that should happen as soon as the
> > target responds to any command with a sense key of UNIT ATTENTION
> > with
> > ASC=0x2a and ascq=6 or 7 (ALUA state change, ALUA state transition
> > failed).
> > 
> > @Brian, does your storage unit not do this? If so, I suggest we
> > disable
> > the sysfs prioritizer for pure storage.
> > 
> > Otherwise, as far as multipathd is concerned, when a path is
> > reinstated, it should be sufficient to send any IO command to
> > trigger
> > an RTPG. Or am I missing something here?
> > 
> > Martin
> 
> Martin,
> 
> What I am gaining with the rescan is exactly that. You are correct
> the ALUA device handler the kernel has to send an RTPG to the target.
> 
> We do set a unit attention to have the initiator paths go into the
> ANO state before we reboot leading to the path loss, but we do not
> set a unit attention when the paths come back up.
> 
>  We have relied on the initiator’s polling to pick up the ALUA state
> change which they always have in the past and the ‘alua’ prioritizer
> still will. For us to add a unit attention would work, but there are
> couple of issues with that.
> 
> 1. Unit attentions may not get back to the initiator. It is not
> guaranteed.

That's news for me. If that happens, wouldn't it mean that the
initiator sees a timeout (no response) to some command, IOW that
there's still something very wrong with this I_T nexus?

> 2. Paths could take a very long time to come back. We might not get
> these paths back for a very long time. Sometimes it is just a reboot.
> Other times it is a hardware replacement. It is possible for us to
> keep this state forever and post when when that I_T nexus returns but
> we haven’t had to.

No offense, that sounds somewhat lazy ;-) Note that it's also kind of
dangerous. You are hiding the state change from the initiator. If the
Linux kernel decided to use the access_state device attribute for
anything else but feeding the sysfs attribute, things might go badly
wrong.

> If we did post the unit attention, everything works as expected. I
> have verified this, but I would also hope that the polling of the
> checkers would also unstick my stale ALUA state and we won’t have to.
> 
> I put this rescan_path inline to show the problem and the fix. I
> wasn’t sure the ‘right’ place to put it. I get that it would be
> better not to block on this. It should be possible to put this in a
> thread so that it does not. The other caller of rescan_path I guess
> is also doing the same thing when it is handling the wwid change.

That's true and not optimal, but wwid changes are rare events and an
error condition in its own right. Patches converting moving
rescan_path() into a thread would receive sympathetic reviews :-)
The big benefit of the sysfs prioritizer is that it never blocks,
without needing pthreads complexity.

Btw, I think you'd need to wait for the RTPG to finish after triggering
the rescan, if you want to obtain the right priority (alua_rescan() ->
alua_initialize() -> alua_check_vpd() will only queue an RTPG and not
wait for the result).

Unfortunately, the kernel has no API for manually triggering an update
of the access_state. I believe that would be useful elsewhere, too. We
can consider adding it, but it won't help with current kernels.

IMO the best option for your storage arrays would is to force using the
alua prioritizer rather than the sysfs one. You are not alone, we're
doing this for RDAC already (see check_rdac() call in detect_prio()).
This can be configured in multipath.conf right now by setting
"detect_prio no" and "prio alua", and we can make it the default for
your storage with a trivial patch for hwtable.c.

Regards
Martin