On 5/20/22 12:57, Martin Wilck wrote:
Brian, Martin,
sorry, I've overlooked this patch previously. I have to say I think
it's wrong and shouldn't have been applied. At least I need more in-
depth explanation.
On Mon, 2022-05-02 at 20:50 -0400, Martin K. Petersen wrote:
On Mon, 2 May 2022 08:09:17 -0700, Brian Bunker wrote:
The handling of the ALUA transitioning state is currently broken.
When
a target goes into this state, it is expected that the target is
allowed to stay in this state for the implicit transition timeout
without a path failure.
Can you please show me a quote from the specs on which this expectation
("without a path failure") is based? AFAIK the SCSI specs don't say
anything about device-mapper multipath semantics.
The handler has this logic, but it gets
skipped currently.
When the target transitions, there is in-flight I/O from the
initiator. The first of these responses from the target will be a
unit
attention letting the initiator know that the ALUA state has
changed.
The remaining in-flight I/Os, before the initiator finds out that
the
portal state has changed, will return not ready, ALUA state is
transitioning. The portal state will change to
SCSI_ACCESS_STATE_TRANSITIONING. This will lead to all new I/O
immediately failing the path unexpectedly. The path failure happens
in
less than a second instead of the expected successes until the
transition timer is exceeded.
dm multipath has no concept of "transitioning" state. Path state can be
either active or inactive. As Brian wrote, commands sent to the
transitioning device will return NOT READY, TRANSITIONING, and require
retries on the SCSI layer. If we know this in advance, why should we
continue sending I/O down this semi-broken path? If other, healthy
paths are available, why it would it not be the right thing to switch
I/O to them ASAP?
But we do, don't we?
Commands are being returned with the appropriate status, and
dm-multipath should make the corresponding decisions here.
This patch just modifies the check when _sending_ commands; ie multipath
had decided that the path is still usable.
Question rather would be why multipath did that; however that logic
isn't modified here.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer