Re: [PATCH 1/1]: scsi scsi_dh_alua: don't fail I/O until transition time expires

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/28/21 8:34 PM, Brian Bunker wrote:
> Do not return an error to multipath which will result in a failed path until the transition time expires.
> 
> The current patch which returns BLK_STS_AGAIN for ALUA transitioning breaks the assumptions in our target
> regarding ALUA states. With that change an error is very quickly returned to multipath which in turn
> immediately fails the path. The assumption in that patch seems to be that another path will be available
> for multipath to use. That assumption I don’t believe is fair to make since while one path is in a
> transitioning state it is reasonable to assume that other paths may also be in non active states. 
> 
I beg to disagree. Path groups are nominally independent, and might
change states independent on the other path groups.
While for some arrays a 'transitioning' state is indeed system-wide,
other arrays might be able to serve I/O on other paths whilst one is in
transitioning.
So I'd rather not presume anything here.


> The SPC spec has a note on this:
> The IMPLICIT TRANSITION TIME field indicates the minimum amount of time in seconds the application client
> should wait prior to timing out an implicit state transition (see 5.15.2.2). A value of zero indicates that
> the implicit transition time is not specified.
> 
Oh, I know _that_ one. What with me being one of the implementors asking
for it :-)

But again, this is _per path_. One cannot assume anything about _other_
paths here.

> In the SCSI ALUA device handler a value of 0 translates to the transition time being set to 60 seconds.
> The current approach of failing I/O on the transitioning path in a much quicker time than what is stated
> seems to violate this aspect of the specification.
> > #define ALUA_FAILOVER_TIMEOUT		60
> unsigned long transition_tmo = ALUA_FAILOVER_TIMEOUT * HZ;
> 

No. The implicit transitioning timeout is used to determine for how long
we will be sending a 'TEST UNIT READY'/'REPORT TARGET PORT GROUPS' combo
to figure out if this particular path is still in transitioning. Once
this timeout is exceeded we're setting the path to 'standby'.
And this 'setting port to standby' is our action for 'timing out an
implicit state transition' as defined by the spec.

> This patch uses the expiry the same way it is used elsewhere in the device handler. Once the transition
> state is entered keep retrying until the expiry value is reached. If that happens, return the error to
> multipath the same way that is currently done with BLK_STS_AGAIN.
> 
And that is precisely what I want to avoid.

As outlined above, we cannot assume that all paths will be set to
'transitioning' once we hit the 'transitioning' condition on one path.
As such, we need to retry the I/O on other paths, to ensure failover
does work in these cases. Hence it's perfectly okay to set this path to
'failed' as we cannot currently send I/O to that path.

If, however, we are hitting a 'transitioning' status on _all_ paths (ie
all paths are set to 'failed') we need to ensure that we do _not_ fail
the I/O (as technically the paths are still alive), but retry with
TUR/RTPG until one path reaches a final state.
Then we should reinstate that path and continue with I/O.

I thought that this is what we do already; but might be that there are
some issues lurking here.

So what is the error you are seeing?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@xxxxxxx			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux