Re: [PATCH 3/4] nvme: tcp: fix race between timeout and normal completion

Sagi Grimberg <sagi@xxxxxxxxxxx> · Tue, 20 Oct 2020 01:11:11 -0700

NVMe TCP timeout handler allows to abort request directly when the
controller isn't in LIVE state. nvme_tcp_error_recovery() updates
controller state as RESETTING, and schedule reset work function. If
new timeout comes before the work function is called, the new timedout
request will be aborted directly, however at that time, the controller
isn't shut down yet, then timeout abort vs. normal completion race
will be triggered.

This assertion is incorrect, the before completing the request from
the timeout handler, we call nvme_tcp_stop_queue, which guarantees upon
return that no more completions will be seen from this queue.

Fix the race by the following approach:

1) aborting timed out request directly only in case that controller is in
CONNECTING and DELETING state. In the two states, controller has been shutdown,
so it is safe to do so; Also, it is enough to recovery controller in this way,
because we only stop/destroy queues during RESETTING, and cancel all in-flight
requests, no new request is required in RESETTING.

Unfortunately RESETTING also requires direct completion because this
state may include I/O that may timeout and unless we complete it
the reset flow cannot make forward progress
(nvme_disable_ctrl/nvme_shutdown_ctrl generate I/O in fabrics).

2) delay unquiesce io queues and admin queue until controller is LIVE
because it isn't necessary to start queues during RESETTING. Instead,
this way may risk timeout vs. normal completion race because we need
to abort timed-out request directly during CONNECTING state for setting
up controller.

We can't unquisce I/O only when the controller is LIVE because I/O needs
to be able to failover for multipath, which should not be tied with
the controller becoming LIVE again what-so-ever...