Re: [RFC PATCH] blk-mq: Fix lost request during timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 19, 2017 at 12:16:31PM +0800, Ming Lei wrote:
> On Tue, Sep 19, 2017 at 7:08 AM, Keith Busch <keith.busch@xxxxxxxxx> wrote:
> >
> > Indeed that prevents .complete from running concurrently with the
> > timeout handler, but scsi_mq_done and nvme_handle_cqe are not .complete
> > callbacks. These are the LLD functions that call blk_mq_complete_request
> > well before .complete. If the driver calls blk_mq_complete_request on
> > a request that blk-mq is timing out, the request is lost because blk-mq
> > already called blk_mark_rq_complete. Nothing prevents these LLD functions
> 
> That shouldn't happen because only one blk_mark_rq_complete() can win
> and it is the winner's responsibility to complete the request, so
> there shouldn't
> be request lost. Especially in your case, it is the responsibility of timeout
> to complete the rq really.

Hm, either I'm explaining this poorly, or I'm missing something that's
obvious to everyone else.

The driver's IRQ handler has no idea it's racing with the blk-mq timeout
handler, and there's nothing indicating it lost the race. The IRQ handler
just calls blk_mq_complete_request. As far as the driver is concerned,
it has done its part to complete the request at that point.

The problem is when blk-mq's timeout handler prevents the request from
completing, and doesn't leave any indication the driver requested to
complete it. Who is responsible for completing that request now?



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux