This is an update to say that I've tested this patch and it works as expected. When the controller returns a Command Interrupted status the request is avoids nvme_failover_req() and goes down the nvme_retry_req() path where the CRD is implemented and the command is retried after a delay. If the controllers returns Command Interrupted too many times, and nvme_req(req)->retries runs down, this results in a device resource error returned to the block layer. But I think we'll have this problem with any error. [Tue Dec 3 08:18:33 2019] print_req_error: device resource error, dev nvme0c0n1, sector 4610048 [Tue Dec 3 08:18:33 2019] print_req_error: device resource error, dev nvme0c0n1, sector 7112704 The alternative is to stop incrementing nvme_req(req)->retries in nvme_retry_req() when CRD is set. diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 24dc9ed1a11b..ec9794698a20 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -258,8 +258,8 @@ static void nvme_retry_req(struct request *req) crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11; if (ns && crd) delay = ns->ctrl->crdt[crd - 1] * 100; - - nvme_req(req)->retries++; + else + nvme_req(req)->retries++; blk_mq_requeue_request(req, false); blk_mq_delay_kick_requeue_list(req->q, delay); } Thoughts? /John On 11/27/19, 2:12 PM, "Meneghini, John" <John.Meneghini@xxxxxxxxxx> wrote: From: John Meneghini <johnm@xxxxxxxxxx> - Fixes bug in nvme_complete_rq logic introduced by Enhanced Command Retry code. According to TP-4033 when ACRE is enabled the host needs to support the Command Interrupted status. - The current code interprets Command Interrupted status as a BLK_STS_IOERR. This results in a controller reset when REQ_NVME_MPATH is set; in nvme_failover_req. Fixes: 49cd84b6f8b677e ("nvme: implement Enhanced Command Retry") Signed-off-by: John Meneghini <johnm@xxxxxxxxxx> Signed-off-by: Hannes Reinecke <hare@xxxxxxx> --- drivers/nvme/host/core.c | 2 ++ include/linux/blk_types.h | 1 + 2 files changed, 3 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 9696404a6182..24dc9ed1a11b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status) return BLK_STS_NEXUS; case NVME_SC_HOST_PATH_ERROR: return BLK_STS_TRANSPORT; + case NVME_SC_CMD_INTERRUPTED: + return BLK_STS_DEV_RESOURCE; default: return BLK_STS_IOERR; } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index d688b96d1d63..6efee8f1b91b 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -84,6 +84,7 @@ static inline bool blk_path_error(blk_status_t error) case BLK_STS_NEXUS: case BLK_STS_MEDIUM: case BLK_STS_PROTECTION: + case BLK_STS_DEV_RESOURCE: return false; } -- 2.21.0