When a request is queued failed, blk_status_t is directly returned
to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
blk_mq_end_request to complete the request with BLK_STS_IOERR.
In two scenarios, the request should be retried and may succeed.
First, if work with nvme multipath, the request may be retried
successfully in another path, because the error is probably related to
the path. Second, if work without multipath software, the request may
be retried successfully after error recovery.
If the request is complete with BLK_STS_IOERR in
blk_mq_dispatch_rq_list.
The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
request asynchronously such as in nvme_submit_user_cmd, in extreme
scenario the request will be repeated freed in tear down.
If a non-resource error occurs in queue_rq, should directly call
nvme_complete_rq to complete request and set the state of request to
MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or
end
the request.
Signed-off-by: Chao Leng <lengchao@xxxxxxxxxx>
---
drivers/nvme/host/rdma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index df9f6f4549f1..4a89bf44ecdc 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct
blk_mq_hw_ctx *hctx,
unmap_qe:
ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct
nvme_command),
DMA_TO_DEVICE);
- return ret;
+ return nvme_try_complete_failed_req(rq, ret);
I don't understand this. There are errors that may not be related to
anything that is pathing related (sw bug, memory leak, mapping error,
etc, etc) why should we return this one-shot error?
Although fail over retry is not required, if we return the error to
blk-mq, a low probability crash may happen. because blk-mq do not set
the state of request to MQ_RQ_COMPLETE before complete the request,
the request may be freed asynchronously such as in nvme_submit_user_cmd.
If race with error recovery, request double completion may happens.
Then fix that, don't work around it.
I'm not trying to work around it. The purpose of this is to solve
the problem of nvme native multipathing at the same time.
Please explain how this is an nvme-multipath issue?
So we can not return the error to blk-mq if the blk_status_t is not
BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.
This is not something we should be handling in nvme. block drivers
should be able to fail queue_rq, and this all should live in the
block layer.
Of course, it is also an idea to repair the block drivers directly.
However, block layer is unaware of nvme native multipathing,
Nor it should be
will cause the request return error which should be avoided.
Not sure I understand..
requests should failover for path related errors,
what queue_rq errors are expected to be failed over from your
perspective?
The scenario: use two HBAs for nvme native multipath, and then one HBA
fault,
What is the specific error the driver sees?
the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
blk_mq_end_request to complete the request which bypass name native
multipath. We expect the request fail over to normal HBA, but the request
is directly completed with BLK_STS_IOERR.
The two scenarios can be fixed by directly completing the request in
queue_rq.
Well, certainly this one-shot always return 0 and complete the command
with HOST_PATH error is not a good approach IMO