Re: nvme: restore use of blk_path_error() in nvme_complete_rq()

Sagi Grimberg <sagi@xxxxxxxxxxx> · Thu, 6 Aug 2020 18:21:03 -0700

Hey Mike,

The point is: blk_path_error() has nothing to do with NVMe errors.
This is dm-multipath logic stuck in the middle of the NVMe error
handling code.

No, it is a means to have multiple subsystems (to this point both SCSI
and NVMe) doing the correct thing when translating subsystem specific
error codes to BLK_STS codes.

Not exactly, don't find any use of this in scsi. The purpose is to make
sure that nvme and dm speak the same language.

If you, or others you name drop below, understood the point we wouldn't
be having this conversation.  You'd accept the point of blk_path_error()
to be valid and required codification of what constitutes a retryable
path error for the Linux block layer.

This incident is a case where the specific nvme status was designed
to retry on the same path respecting the controller retry delay.
And because nvme used blk_path_error at the time it caused us to use a
non-retryable status to get around that. Granted, no one had 
dm-multipath in mind.

So in a sense, there is consensus on changing patch 35038bffa87da
_because_ nvme no longer uses blk_path_error. Otherwise it would break.

Any BLK_STS mapping of NVMe specific error codes would need to not screw
up by categorizing a retryable error as non-retryable (and vice-versa).

But it is a special type of retryable. There is nothing that fits the
semantics of the current behavior.

Again, assuming proper testing, commit 35038bffa87 wouldn't have made it
upstream if NVMe still used blk_path_error().

Agree.

Yes, your commit 764e9332098c0 needlessly removed NVMe's use of
blk_path_error().  If you're saying it wasn't needless please explain
why.

     Fixes: 764e9332098c0 ("nvme-multipath: do not reset on unknown status")
     Cc: stable@xxxxxxxxxxxxxxxx
     Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
     ---
     drivers/nvme/host/core.c | 9 +++++++--
     1 file changed, 7 insertions(+), 2 deletions(-)

     diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
     index 6585d57112ad..072f629da4d8 100644
     --- a/drivers/nvme/host/core.c
     +++ b/drivers/nvme/host/core.c
     @@ -290,8 +290,13 @@ void nvme_complete_rq(struct request *req)
                     nvme_req(req)->ctrl->comp_seen = true;

             if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
     -               if ((req->cmd_flags & REQ_NVME_MPATH) && nvme_failover_req(req))
     -                       return;
     +               if (blk_path_error(status)) {
     +                       if (req->cmd_flags & REQ_NVME_MPATH) {
     +                               if (nvme_failover_req(req))
     +                                       return;
     +                               /* fallthru to normal error handling */
     +                       }
     +               }

This would basically undo the patch Hannes, Christoph, and I put together in
commit 764e9332098c0.  This patch greatly simplified and improved the
whole nvme_complete_rq() code path, and I don't support undoing that change.

Please elaborate on how I've undone anything?

I think you're right, this hasn't undone the patch from John, but it
breaks NVME_SC_CMD_INTERRUPTED error handling behavior.

The only thing I may have done is forced NVMe to take more care about
properly translating its NVME_SC to BLK_STS in nvme_error_status().
Which is a good thing.

I don't think there is an issue here of mistakenly converting an nvme
status code to a wrong block status code. This conversion is there
because there is no block status that give us the semantics we need
which is apparently specific to nvme.

I personally don't mind restoring blk_path_error to nvme, I don't
particularly feel strong about it either. But for sure blk_path_error
needs to first provide the semantics needed for NVME_SC_CMD_INTERRUPTED.

...

Anyway, no new BLK_STS is needed at this point.  More discipline with
how NVMe's error handling is changed is.

Please read the above.

If NVMe wants to ensure its
interface isn't broken regularly it _should_ use blk_path_error() to
validate future nvme_error_status() changes.  Miscategorizing NVMe
errors to upper layers is a bug -- not open for debate.

Again, don't agree is a Miscategorization nor a bug, its just something
that is NVMe specific.

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel