Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion

Sagi Grimberg <sagi@xxxxxxxxxxx> · Wed, 20 Jan 2021 13:35:47 -0800

is not something we should be handling in nvme. block drivers
should be able to fail queue_rq, and this all should live in the
block layer.
Of course, it is also an idea to repair the block drivers directly.
However, block layer is unaware of nvme native multipathing,

Nor it should be

will cause the request return error which should be avoided.

Not sure I understand..
requests should failover for path related errors,
what queue_rq errors are expected to be failed over from your
perspective?
Although fail over for only path related errors is the best choice, it's
almost impossible to achieve.
The probability of non-path-related errors is very low. Although these
errors do not require fail over retry, the cost of fail over retry
is complete the request with error delay a bit long time(retry several
times). It's not the best choice, but I think it's acceptable, because
HBA driver does not have path-related error codes but only general error
codes. It is difficult to identify whether the general error codes are
path-related.

If we have a SW bug or breakage that can happen occasionally, this can
result in a constant failover rather than a simple failure. This is just
not a good approach IMO.

The scenario: use two HBAs for nvme native multipath, and then one HBA
fault,

What is the specific error the driver sees?
The path related error code is closely related to HBA driver
implementation. In general it is EIO. I don't think it's a good idea to
assume what general error code the driver returns in the event of a path
error.

But assuming every error is a path error a good idea?

the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
blk_mq_end_request to complete the request which bypass name native
multipath. We expect the request fail over to normal HBA, but the 
request
is directly completed with BLK_STS_IOERR.
The two scenarios can be fixed by directly completing the request in 
queue_rq.
Well, certainly this one-shot always return 0 and complete the command
with HOST_PATH error is not a good approach IMO
So what's the better option? Just complete the request with host path
error for non-ENOMEM and EAGAIN returned by the HBA driver?

Well, the correct thing to do here would be to clone the bio and
failover if the end_io error status is BLK_STS_IOERR. That sucks
because it adds overhead, but this proposal doesn't sit well. it
looks wrong to me.

Alternatively, a more creative idea would be to encode the error
status somehow in the cookie returned from submit_bio, but that
also feels like a small(er) hack..