On Mon, 2017-09-18 at 18:03 -0400, Keith Busch wrote: > I think we've always known it's possible to lose a request during timeout > handling, but just accepted that possibility. It seems to be causing > problems, though, leading to unnecessary error escalation and IO failures. > > The possiblity arises when the block layer marks the request complete > prior to running the timeout handler. If that request happens to complete > while the handler is running, the request will be lost, inevitably > triggering a second timeout. > > This patch attempts to shorten the window for this race condition by > clearing the started flag when the driver completes a request. The block > layer's timeout handler will then complete the command if it observes > the started flag is no longer set. > > Note it's possible to lose the command even with this patch. It's just > less likely to happen. Hello Keith, Are you sure the root cause of this race condition is in the blk-mq core? I've never observed such behavior in any of my numerous scsi-mq tests (which trigger timeouts). Are you sure the race you observed is not caused by a blk_mq_reinit_tagset() call, a function that is only used by the NVMe driver and not by any other block driver? Bart.