Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

Ming Lei <tom.leiming@xxxxxxxxx> · Tue, 22 May 2018 23:07:07 +0800

On Tue, May 22, 2018 at 11:01 PM, Keith Busch
<keith.busch@xxxxxxxxxxxxxxx> wrote:
> On Tue, May 22, 2018 at 10:57:40PM +0800, Ming Lei wrote:
>> OK, that still depends on driver's behaviour, even though it is true
>> for NVMe,  you still have to audit other drivers about this sync
>> because it is required by your patch.
>
> Okay, forget about nvme for a moment here. Please run this thought
> experiment to decide if what you're saying is even plausible for any
> block driver with the existing implementation:
>
> Your scenario has a driver return EH_HANDLED for a timed out request. The
> timeout work then completes the request. The request may then be
> reallocated for a new command and dispatched.

Yes.

>
> At approximately the same time, you're saying the driver that returned
> EH_HANDLED may then call blk_mq_complete_request() in reference to the
> *old* request that it returned EH_HANDLED for, incorrectly completing

Because this request has been completed above by blk-mq timeout,
then this old request won't be completed any more via blk_mq_complete_request()
either from normal path or what ever, such as cancel.

> the new request before it is done. That will inevitably lead to data
> corruption. Is that happening today in any driver?

No such issue since current blk-mq makes sure one req is only completed
once, but your patch changes to depend on driver to make sure that.

thanks,
Ming Lei