Re: [PATCH 1/1] scsi_dh_alua: properly handling the ALUA transitioning state

Martin Wilck <mwilck@xxxxxxxx> · Fri, 20 May 2022 16:03:51 +0200

On Fri, 2022-05-20 at 14:06 +0200, Hannes Reinecke wrote:
> On 5/20/22 12:57, Martin Wilck wrote:
> > Brian, Martin,
> > 
> > sorry, I've overlooked this patch previously. I have to say I think
> > it's wrong and shouldn't have been applied. At least I need more
> > in-
> > depth explanation.
> > 
> > On Mon, 2022-05-02 at 20:50 -0400, Martin K. Petersen wrote:
> > > On Mon, 2 May 2022 08:09:17 -0700, Brian Bunker wrote:
> > > 
> > > > The handling of the ALUA transitioning state is currently
> > > > broken.
> > > > When
> > > > a target goes into this state, it is expected that the target
> > > > is
> > > > allowed to stay in this state for the implicit transition
> > > > timeout
> > > > without a path failure.
> > 
> > Can you please show me a quote from the specs on which this
> > expectation
> > ("without a path failure") is based? AFAIK the SCSI specs don't say
> > anything about device-mapper multipath semantics.
> > 
> > > > The handler has this logic, but it gets
> > > > skipped currently.
> > > > 
> > > > When the target transitions, there is in-flight I/O from the
> > > > initiator. The first of these responses from the target will be
> > > > a
> > > > unit
> > > > attention letting the initiator know that the ALUA state has
> > > > changed.
> > > > The remaining in-flight I/Os, before the initiator finds out
> > > > that
> > > > the
> > > > portal state has changed, will return not ready, ALUA state is
> > > > transitioning. The portal state will change to
> > > > SCSI_ACCESS_STATE_TRANSITIONING. This will lead to all new I/O
> > > > immediately failing the path unexpectedly. The path failure
> > > > happens
> > > > in
> > > > less than a second instead of the expected successes until the
> > > > transition timer is exceeded.
> > 
> > dm multipath has no concept of "transitioning" state. Path state
> > can be
> > either active or inactive. As Brian wrote, commands sent to the
> > transitioning device will return NOT READY, TRANSITIONING, and
> > require
> > retries on the SCSI layer. If we know this in advance, why should
> > we
> > continue sending I/O down this semi-broken path? If other, healthy
> > paths are available, why it would it not be the right thing to
> > switch
> > I/O to them ASAP?
> > 
> But we do, don't we?
> Commands are being returned with the appropriate status, and 
> dm-multipath should make the corresponding decisions here.
> This patch just modifies the check when _sending_ commands; ie
> multipath 
> had decided that the path is still usable.
> Question rather would be why multipath did that;

If alua_prep_fn() got called, the path was considered usable at the
given point in time by dm-multipath. Most probably the reason was
simply that no error condition had occured on this path before ALUA
state switched to transitioning. I suppose this can happen if storage
switches a PG consisting of multiple paths to TRANSITIONING. We get an
error on one path (sda, say), issue an RTPG, and receive the new ALUA
state for all paths of the PG. For all paths except sda, we'd just see
a switch to TRANSITIONING without a previous SCSI error.

With this patch, we'll dispatch I/O (usually an entire bunch) to these
paths despite seeing them in TRANSITIONING state. Eventually, when the
SCSI responses are received, this leads to path failures. If I/O
latencies are small, this happens after a few ms. In that case, the
goal of Brian's patch is not reached, because the time until path
failure would still be on the order of milliseconds. OTOH, if latencies
are high, it takes substantially longer for the kernel to realize that
the path is non-functional, while other, good paths may be idle. I fail
to see the benefit.

Regards,
Martin

>  however that logic 
> isn't modified here.
> 
> Cheers,
> 
> Hannes