Re: [RFC PATCH] blk-mq: Fix lost request during timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2017-09-18 at 18:03 -0400, Keith Busch wrote:
> I think we've always known it's possible to lose a request during timeout
> handling, but just accepted that possibility. It seems to be causing
> problems, though, leading to unnecessary error escalation and IO failures.
> 
> The possiblity arises when the block layer marks the request complete
> prior to running the timeout handler. If that request happens to complete
> while the handler is running, the request will be lost, inevitably
> triggering a second timeout.
> 
> This patch attempts to shorten the window for this race condition by
> clearing the started flag when the driver completes a request. The block
> layer's timeout handler will then complete the command if it observes
> the started flag is no longer set.
> 
> Note it's possible to lose the command even with this patch. It's just
> less likely to happen.

Hello Keith,

Are you sure the root cause of this race condition is in the blk-mq core?
I've never observed such behavior in any of my numerous scsi-mq tests (which
trigger timeouts). Are you sure the race you observed is not caused by a
blk_mq_reinit_tagset() call, a function that is only used by the NVMe driver
and not by any other block driver?

Bart.




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux