Re: [PATCH] multipathd: the sysfs prioritizer can return stale data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jan 31, 2024, at 2:19 AM, Martin Wilck <mwilck@xxxxxxxx> wrote:
> 
> Hi Brian,
> 
> On Tue, 2024-01-30 at 10:43 -0800, Brian Bunker wrote:
>>> 
>>> A full rescan shouldn't be necessary. All that's needed is that the
>>> kernel issue another RTPG. AFAICS that should happen as soon as the
>>> target responds to any command with a sense key of UNIT ATTENTION
>>> with
>>> ASC=0x2a and ascq=6 or 7 (ALUA state change, ALUA state transition
>>> failed).
>>> 
>>> @Brian, does your storage unit not do this? If so, I suggest we
>>> disable
>>> the sysfs prioritizer for pure storage.
>>> 
>>> Otherwise, as far as multipathd is concerned, when a path is
>>> reinstated, it should be sufficient to send any IO command to
>>> trigger
>>> an RTPG. Or am I missing something here?
>>> 
>>> Martin
>> 
>> Martin,
>> 
>> What I am gaining with the rescan is exactly that. You are correct
>> the ALUA device handler the kernel has to send an RTPG to the target.
>> 
>> We do set a unit attention to have the initiator paths go into the
>> ANO state before we reboot leading to the path loss, but we do not
>> set a unit attention when the paths come back up.
>> 
>> We have relied on the initiator’s polling to pick up the ALUA state
>> change which they always have in the past and the ‘alua’ prioritizer
>> still will. For us to add a unit attention would work, but there are
>> couple of issues with that.
>> 
>> 1. Unit attentions may not get back to the initiator. It is not
>> guaranteed.
> 
> That's news for me. If that happens, wouldn't it mean that the
> initiator sees a timeout (no response) to some command, IOW that
> there's still something very wrong with this I_T nexus?
Probably. My point is just that any individual response could be lost
wherever and there is no burden on the target to ensure the initiator
got the unit attention.
> 
>> 2. Paths could take a very long time to come back. We might not get
>> these paths back for a very long time. Sometimes it is just a reboot.
>> Other times it is a hardware replacement. It is possible for us to
>> keep this state forever and post when when that I_T nexus returns but
>> we haven’t had to.
> 
> No offense, that sounds somewhat lazy ;-) Note that it's also kind of
> dangerous. You are hiding the state change from the initiator. If the
> Linux kernel decided to use the access_state device attribute for
> anything else but feeding the sysfs attribute, things might go badly
> wrong.
That is fair. We definitely could do better here. In general, that unit
attention coming out of the preferred state didn’t buy us any speed.
Those non-preferred paths weren’t serving I/O so the first I/O that
would pick up the unit attention on those paths would be the path
checker. The same run of the path checker picked up the new ALUA
state. When going into the preferred state, there is read and write
I/O which means those unit attentions are picked up very quickly and
the ALUA state change is picked up in the kernel before the checker
runs again.

Have you ever considered a checker of RTPG as opposed to TUR?
That would seeming solve a lot of this too since you would be getting
path state and priority in the same trip.
> 
>> If we did post the unit attention, everything works as expected. I
>> have verified this, but I would also hope that the polling of the
>> checkers would also unstick my stale ALUA state and we won’t have to.
>> 
>> I put this rescan_path inline to show the problem and the fix. I
>> wasn’t sure the ‘right’ place to put it. I get that it would be
>> better not to block on this. It should be possible to put this in a
>> thread so that it does not. The other caller of rescan_path I guess
>> is also doing the same thing when it is handling the wwid change.
> 
> That's true and not optimal, but wwid changes are rare events and an
> error condition in its own right. Patches converting moving
> rescan_path() into a thread would receive sympathetic reviews :-)
> The big benefit of the sysfs prioritizer is that it never blocks,
> without needing pthreads complexity.
> 
> Btw, I think you'd need to wait for the RTPG to finish after triggering
> the rescan, if you want to obtain the right priority (alua_rescan() ->
> alua_initialize() -> alua_check_vpd() will only queue an RTPG and not
> wait for the result).
For our purposes we didn’t really need to wait for the rescan. As long as
it happened. The next time the checker ran it would pick it up. These paths
returning for us are redundant paths. We want them back as soon as possible
but we have other paths that can serve I/O while waiting for the HA paths.

I can create a patch in the sysfs prioritizer to do the rescan_path in a thread
that the checker and priority run doesn’t wait on. Would that be well received
or I am better served by either posting a unit attention or just using detect_prio
set to no and leaving the ’sysfs’ prioritizer alone?
> 
> Unfortunately, the kernel has no API for manually triggering an update
> of the access_state. I believe that would be useful elsewhere, too. We
> can consider adding it, but it won't help with current kernels.
> 
> IMO the best option for your storage arrays would is to force using the
> alua prioritizer rather than the sysfs one. You are not alone, we're
> doing this for RDAC already (see check_rdac() call in detect_prio()).
> This can be configured in multipath.conf right now by setting
> "detect_prio no" and "prio alua", and we can make it the default for
> your storage with a trivial patch for hwtable.c.
This is what we are doing now in our recommended configuration. I will
probably add a patch for our hw table entry soon. It is a bit strange to
me still that detect_prio would mean replace the one that I am explicitly
stating in the device section. To me detect_prio would be if I didn’t
provide one and wanted multipath to choose for me.
> 
> Regards
> Martin
> 
Thanks,
Brian






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux