Re: [RFC PATCH] blk-mq: Fix lost request during timeout

Keith Busch <keith.busch@xxxxxxxxx> · Tue, 19 Sep 2017 12:39:29 -0400

On Tue, Sep 19, 2017 at 03:18:45PM +0000, Bart Van Assche wrote:
> On Tue, 2017-09-19 at 11:07 -0400, Keith Busch wrote:
> > The problem is when blk-mq's timeout handler prevents the request from
> > completing, and doesn't leave any indication the driver requested to
> > complete it. Who is responsible for completing that request now?
> 
> Hello Keith,
> 
> My understanding is that block drivers are responsible for completing timed
> out requests by using one of the following approaches:
> * By returning BLK_EH_HANDLED or BLK_EH_RESET_TIMER from inside the timeout
>   handler.

No problem with BLK_EH_HANDLED when timeout handler completes the
request. That usage at least makes sense.

In NVMe, we use BLK_EH_RESET_TIMER if the driver does an asynchronous
action to reclaim the request. If the request is returned very soon though
(before blk-mq clears ATOM_COMPLETE), blk_mq_complete_request will still
to do nothing.

> * By returning BLK_EH_NOT_HANDLED and by calling blk_mq_end_request() or
>   __blk_mq_end_request() for the request that timed out.

You want to bypass __blk_mq_complete_request? Doesn't that actually do
important things with queue and scheduler stats? If it's not important,
then this sounds like the piece I'm looking for, but this also puts a
burden on the driver to track the state of their requests that blk-mq
could do for all drivers.