Re: [PATCH v2 0/6] avoid repeated request completion and IO error

Chao Leng <lengchao@xxxxxxxxxx> · Thu, 14 Jan 2021 14:50:02 +0800

On 2021/1/14 8:15, Sagi Grimberg wrote:
First avoid repeated request completion for nvmf_fail_nonready_command.
Second avoid IO error and repeated request completion for queue_rq.

Maybe this is me chiming in v2, but what is this fixing? what
is the bug you are seeing?The bug is crash and io error in two scenarios.
First inject request time out, crash happens due to request double
completion, the probability is very low. The reason: we will do error
recovery for request time out. When error recovery, new request will
be completed by nvmf_fail_nonready_command in queue_rq, the state of
the request will be changed to MQ_RQ_IN_FLIGHT, the request is freed
asynchronously in nvme_submit_user_cmd, nvme_submit_user_cmd may
run after cancel request(the state of the request is MQ_RQ_IN_FLIGHT)
in error recovery. The request will be double completion.

Second use two HBAs for nvme native multipath, and then inject one HBA
fault, io error happens and a low probability crash happens. The reason
of io error is the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq
call blk_mq_end_request to complete the request. We expect the request
fail over to normal HBA, but the request is directly completed with
BLK_STS_IOERR. The reason of crash is similar to the first scenario.