On Wed, May 23, 2018 at 08:34:48AM +0800, Ming Lei wrote: > Let's consider the normal NVMe timeout code path: > > 1) one request is timed out; > > 2) controller is shutdown, this timed-out request is requeued from > nvme_cancel_request(), but can't dispatch because queues are quiesced > > 3) reset is done from another context, and this request is dispatched > again, and completed exactly before returning EH_HANDLED to blk-mq, but > its state isn't updated to COMPLETE yet. > > 4) then double completions are done from both normal completion and timeout > path. We're definitely fixing this, but I must admit that's an impressive cognitive traversal across 5 thread contexts to arrive at that race. :)