On Tue, Oct 20, 2020 at 01:11:11AM -0700, Sagi Grimberg wrote: > > > NVMe TCP timeout handler allows to abort request directly when the > > controller isn't in LIVE state. nvme_tcp_error_recovery() updates > > controller state as RESETTING, and schedule reset work function. If > > new timeout comes before the work function is called, the new timedout > > request will be aborted directly, however at that time, the controller > > isn't shut down yet, then timeout abort vs. normal completion race > > will be triggered. > > This assertion is incorrect, the before completing the request from > the timeout handler, we call nvme_tcp_stop_queue, which guarantees upon > return that no more completions will be seen from this queue. OK, then looks the issue can be fixed by patch 1 & 2 only. Yi, can you test again and see if the issue can be fixed by patch 1 & 2? Thanks, Ming