On Tue, May 22, 2018 at 11:17 PM, Keith Busch <keith.busch@xxxxxxxxxxxxxxx> wrote: > On Tue, May 22, 2018 at 11:07:07PM +0800, Ming Lei wrote: >> > At approximately the same time, you're saying the driver that returned >> > EH_HANDLED may then call blk_mq_complete_request() in reference to the >> > *old* request that it returned EH_HANDLED for, incorrectly completing >> >> Because this request has been completed above by blk-mq timeout, >> then this old request won't be completed any more via blk_mq_complete_request() >> either from normal path or what ever, such as cancel. > >> > the new request before it is done. That will inevitably lead to data >> > corruption. Is that happening today in any driver? >> >> No such issue since current blk-mq makes sure one req is only completed >> once, but your patch changes to depend on driver to make sure that. > > The blk-mq timeout complete makes the request available for allocation > as a new command, at which point blk_mq_complete_request can be called > again. If a driver is somehow relying on blk-mq to prevent a double > completion for a previously completed request context, they're already > in a lot of trouble. Yes, previously there is the atomic flag of REQ_ATOM_COMPLETE for covering the atomic completion, and recently Tejun changes to aborted state with generation counter, but both provides sort of atomic completion. So even though it is much simplified by using request refcount, the atomic completion should be provided by blk-mq, or drivers have to be audited to avoid double completion. Thanks, Ming Lei