Re: [PATCHv3 3/5] nvme: implement I/O Command Sets Command Set support

Sagi Grimberg <sagi@xxxxxxxxxxx> · Wed, 24 Jun 2020 15:54:05 -0700

On 6/24/20 2:49 PM, Keith Busch wrote:
On Wed, Jun 24, 2020 at 12:03:41PM -0700, Sagi Grimberg wrote:

If the controller does not support the CNS value, it may return Invalid
Field with DNR set. That error currently gets propogated back to
nvme_init_ns_head(), which then abandons the namespace. Just as the code
coments say, we had been historically been clearing such errors because
we have other ways to identify the namespace, but now we're not clearing
that error.

I don't understand what you are saying Keith.

My comment was for this block:
--
       if (!status && nvme_multi_css(ctrl) && !csi_seen) {
           dev_warn(ctrl->device, "Command set not reported for nsid:%d\n",
                nsid);
           status = -EINVAL;
       }
--

I was saying that !status doesn't necessarily mean success, but it can
also mean that its an retry-able error status (due to transport or
controller). If we see a retry-able error we should still clear the
status so we don't abandon the namespace.

This for example would achieve that:

We're not talking about the same thing. I am only talking about what
introduced the DNR check, from commit 59c7c3caaaf87.

I know you added it because you want to abort comparing identifiers on a
rescan when retrieving the identifiers failed. That's fine, but I am
saying this fails namespace creation in the first place for some types
of devices that used to succeed.

OK, now I think I understand (thanks for clarifying that the discussion
is not on patch 3/5, but rather on 59c7c3caaaf87).

So the original proposal was to check NVME_SC_HOST_PATH_ERROR (and now
we have NVME_SC_HOST_ABORTED_CMD) but with the review Christoph gave
it seemed to make more sense that we generalize and check the DNR bit.

Okay, I didn't question this approach when it first went through, so
sorry about this digression, but I really don't get how this DNR check
helps at all.

The code currently returns an error if DNR is set.

Right.

Based on the commit
messages, it sounds like you need that error to skip comparing
identifiers through nvme's scan_work calling revalidate_disk():
suppressing the error has revalidate_disk() fail with -ENODEV when
comparing identifiers fails.

Your understanding is correct.

Since we do return the error when DNR is set, we skip comparing
identifiers and return blk_status_to_errno(nvme_error_status(ret))
instead. How is that errno an improvement?

See nvme_revalidate_disk:
out:
        /*
         * Only fail the function if we got a fatal error back from the
         * device, otherwise ignore the error and just move on.
         */
        if (ret == -ENOMEM || (ret > 0 && !(ret & NVME_SC_DNR)))
                ret = 0;
        else if (ret > 0)
                ret = blk_status_to_errno(nvme_error_status(ret));
        return ret;

So we don't actually propagate the error back if its a non-DNR (hence
retry-able error). The errno was there before in order to not leak NVMe
errors to the block layer.

And then if DNR is not set, we suppress the error and proceed with
comparing identifiers??

Wait, I think that I re-read it it's backwards. The intent was to indeed
clear the error for the DNR case and propagate the error for the non-DNR
case!

The code needs to be:
--

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2afed32d3892..3e84ab6c2bd3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1130,7 +1130,7 @@ static int nvme_identify_ns_descs(struct nvme_ctrl 
*ctrl, unsigned nsid,
                  * Don't treat an error as fatal, as we potentially 
already
                  * have a NGUID or EUI-64.
                  */
-               if (status > 0 && !(status & NVME_SC_DNR))
+               if (status > 0 && (status & NVME_SC_DNR))
                        status = 0;
                goto free_data;
        }
--

This way, if the controller failed the identify it will be with DNR
status and we will silently ignore, and if the transport failed its
a non-DNR status, and we propagate the status back, skip the id compare,
and then silently ignore the error in nvme_revalidate_disk (as above).

Looking into the original fix we had internally, this was the case, and
got fat-fingered in I can only assume...

Will send a fix right away, good catch keith!