Re: [PATCH 3/4] nvme: tcp: fix race between timeout and normal completion

Ming Lei <ming.lei@xxxxxxxxxx> · Tue, 20 Oct 2020 17:44:20 +0800

On Tue, Oct 20, 2020 at 01:11:11AM -0700, Sagi Grimberg wrote:
> 
> > NVMe TCP timeout handler allows to abort request directly when the
> > controller isn't in LIVE state. nvme_tcp_error_recovery() updates
> > controller state as RESETTING, and schedule reset work function. If
> > new timeout comes before the work function is called, the new timedout
> > request will be aborted directly, however at that time, the controller
> > isn't shut down yet, then timeout abort vs. normal completion race
> > will be triggered.
> 
> This assertion is incorrect, the before completing the request from
> the timeout handler, we call nvme_tcp_stop_queue, which guarantees upon
> return that no more completions will be seen from this queue.

OK, then looks the issue can be fixed by patch 1 & 2 only.

Yi, can you test again and see if the issue can be fixed by patch 1 & 2?

Thanks,
Ming