On Tue, May 22, 2018 at 11:01 PM, Keith Busch <keith.busch@xxxxxxxxxxxxxxx> wrote: > On Tue, May 22, 2018 at 10:57:40PM +0800, Ming Lei wrote: >> OK, that still depends on driver's behaviour, even though it is true >> for NVMe, you still have to audit other drivers about this sync >> because it is required by your patch. > > Okay, forget about nvme for a moment here. Please run this thought > experiment to decide if what you're saying is even plausible for any > block driver with the existing implementation: > > Your scenario has a driver return EH_HANDLED for a timed out request. The > timeout work then completes the request. The request may then be > reallocated for a new command and dispatched. Yes. > > At approximately the same time, you're saying the driver that returned > EH_HANDLED may then call blk_mq_complete_request() in reference to the > *old* request that it returned EH_HANDLED for, incorrectly completing Because this request has been completed above by blk-mq timeout, then this old request won't be completed any more via blk_mq_complete_request() either from normal path or what ever, such as cancel. > the new request before it is done. That will inevitably lead to data > corruption. Is that happening today in any driver? No such issue since current blk-mq makes sure one req is only completed once, but your patch changes to depend on driver to make sure that. thanks, Ming Lei