On Tue, 2021-04-27 at 16:41 -0400, Ewan D. Milne wrote:
On Tue, 2021-04-27 at 20:33 +0000, Martin Wilck wrote:On Tue, 2021-04-27 at 16:14 -0400, Ewan D. Milne wrote:There's no way to do that, in principle. Because there could beother I/Os in flight. You might (somehow) avoid retrying an I/Othat got a UA until you figured out if something changed, but otherI/Os can already have been sent to the target, or issued before youget to look at the status.
If something happens on a storage side where a lun gets it's attributes changed (any, doesn't matter which one) a UA should be sent. Also all outstanding IO's on that lun should be returning an Abort as it can no longer warrant the validity of any IO due to these changes. Especially when parameters are involved like reservations (PR's) etc. If that does not happen from an array side all bets are off as the only way to be able to get back in business is to start from scratch.
Right. But in practice, a WWID change will hardly happen under fullIOload. The storage side will probably have to block IO while thishappens, at least for a short time period. So blocking and quiescingthe queue upon an UA might still work, most of the time. Even if wewere too late already, the sooner we stop the queue, the better.
I think in most cases when something happens on an array side you will see IO's being aborted. That might be a good time to start doing TUR's and if these come back OK do a new inquiry. From a host side there is only so much you can do.
The current algorithm in multipath-tools needs to detect a path goingdown and being reinstated. The time interval during which a WWIDchangewill go unnoticed is one or more path checker intervals, typically onthe order of 5-30 seconds. If we could decrease this interval to asub-second or even millisecond range by blocking the queue in the kernelquickly, we'd have made a big step forward.Yes, and in many situations this may help. But in the general casewe can't protect against a storage array misconfiguration,where something like this can happen. So I worry about peoplebelieving the host software will protect them against a mistake,when we can't really do that.
My thought exactly.
All it takes is one I/O (a discard) to make a thorough mess of the LUN.-EwanRegardsMartin--dm-devel mailing list
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel