Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline

Daniel Wagner <dwagner@xxxxxxx> · Fri, 29 May 2020 16:41:39 +0200



On Fri, May 29, 2020 at 03:53:15PM +0200, Christoph Hellwig wrote:
> From: Ming Lei <ming.lei@xxxxxxxxxx>
> 
> Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
> up queue mapping. Thomas mentioned the following point[1]:
> 
> "That was the constraint of managed interrupts from the very beginning:
> 
>  The driver/subsystem has to quiesce the interrupt line and the associated
>  queue _before_ it gets shutdown in CPU unplug and not fiddle with it
>  until it's restarted by the core when the CPU is plugged in again."
> 
> However, current blk-mq implementation doesn't quiesce hw queue before
> the last CPU in the hctx is shutdown.  Even worse, CPUHP_BLK_MQ_DEAD is a
> cpuhp state handled after the CPU is down, so there isn't any chance to
> quiesce the hctx before shutting down the CPU.
> 
> Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs
> where the last CPU goes away, and wait for completion of in-flight
> requests.  This guarantees that there is no inflight I/O before shutting
> down the managed IRQ.
> 
> Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need
> to wait for completion of in-flight requests from these drivers to avoid
> a potential dead-lock. It is safe to do this for stacking drivers as those
> do not use interrupts at all and their I/O completions are triggered by
> underlying devices I/O completion.
> 
> [1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@xxxxxxxxxxxxxxxxxxxxxxx/
> 
> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> [hch: different retry mechanism, merged two patches, minor cleanups]
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>

Reviewed-by: Daniel Wagner <dwagner@xxxxxxx>