On 8/13/20, 10:48 AM, "Mike Snitzer" <snitzer@xxxxxxxxxx> wrote: Commit 764e9332098c0 ("nvme-multipath: do not reset on unknown status"), among other things, fixed NVME_SC_CMD_INTERRUPTED error handling by changing multipathing's nvme_failover_req() to short-circuit path failover and then fallback to NVMe's normal error handling (which takes care of NVME_SC_CMD_INTERRUPTED). This detour through native NVMe multipathing code is unwelcome because it prevents NVMe core from handling NVME_SC_CMD_INTERRUPTED independent of any multipathing concerns. Introduce nvme_status_needs_local_error_handling() to prioritize non-failover retry, when appropriate, in terms of normal NVMe error handling. nvme_status_needs_local_error_handling() will naturely evolve to include handling of any other errors that normal error handling must be used for. How is this any better than blk_path_error()? nvme_failover_req()'s ability to fallback to normal NVMe error handling has been preserved because it may be useful for future NVME_SC that nvme_status_needs_local_error_handling() hasn't been trained for yet. Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx> --- drivers/nvme/host/core.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 88cff309d8e4..be749b690af7 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -252,6 +252,16 @@ static inline bool nvme_req_needs_retry(struct request *req) return true; } +static inline bool nvme_status_needs_local_error_handling(u16 status) +{ + switch (status & 0x7ff) { + case NVME_SC_CMD_INTERRUPTED: + return true; + default: + return false; + } +} I assume that what you mean by nvme_status_needs_local_error_handling is - do you want the nvme core driver to handle the command retry. If this is true, I don't think this function will ever work correctly because,. as discussed, whether or not the command needs to be retried has nothing to do with the NVMe Status Code Field itself, it has to so with the DNR bit, the CRD field, and the Status Code Type field - in that order. + static void nvme_retry_req(struct request *req) { struct nvme_ns *ns = req->q->queuedata; @@ -270,7 +280,8 @@ static void nvme_retry_req(struct request *req) void nvme_complete_rq(struct request *req) { - blk_status_t status = nvme_error_status(nvme_req(req)->status); + u16 nvme_status = nvme_req(req)->status; + blk_status_t status = nvme_error_status(nvme_status); trace_nvme_complete_rq(req); @@ -280,7 +291,8 @@ void nvme_complete_rq(struct request *req) nvme_req(req)->ctrl->comp_seen = true; if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) { - if ((req->cmd_flags & REQ_NVME_MPATH) && nvme_failover_req(req)) + if (!nvme_status_needs_local_error_handling(nvme_status) && This defeats the nvme-multipath logic by inserting a second evaluation of the NVMe Status Code into the retry logic. This is basically another version of blk_path_error(). In fact, in your case REQ_NVME_MPATH is probably not set, so I don't see what difference this would make at all. /John + (req->cmd_flags & REQ_NVME_MPATH) && nvme_failover_req(req)) return; if (!blk_queue_dying(req->q)) { -- 2.18.0 -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel