On Tue, 2021-04-27 at 20:33 +0000, Martin Wilck wrote: > On Tue, 2021-04-27 at 16:14 -0400, Ewan D. Milne wrote: > > > > There's no way to do that, in principle. Because there could be > > other I/Os in flight. You might (somehow) avoid retrying an I/O > > that got a UA until you figured out if something changed, but other > > I/Os can already have been sent to the target, or issued before you > > get to look at the status. > > Right. But in practice, a WWID change will hardly happen under full > IO > load. The storage side will probably have to block IO while this > happens, at least for a short time period. So blocking and quiescing > the queue upon an UA might still work, most of the time. Even if we > were too late already, the sooner we stop the queue, the better. > > The current algorithm in multipath-tools needs to detect a path going > down and being reinstated. The time interval during which a WWID > change > will go unnoticed is one or more path checker intervals, typically on > the order of 5-30 seconds. If we could decrease this interval to a > sub- > second or even millisecond range by blocking the queue in the kernel > quickly, we'd have made a big step forward. Yes, and in many situations this may help. But in the general case we can't protect against a storage array misconfiguration, where something like this can happen. So I worry about people believing the host software will protect them against a mistake, when we can't really do that. All it takes is one I/O (a discard) to make a thorough mess of the LUN. -Ewan > > Regards > Martin >