On Fri, May 29, 2020 at 03:53:15PM +0200, Christoph Hellwig wrote: > From: Ming Lei <ming.lei@xxxxxxxxxx> > > Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup > up queue mapping. Thomas mentioned the following point[1]: > > "That was the constraint of managed interrupts from the very beginning: > > The driver/subsystem has to quiesce the interrupt line and the associated > queue _before_ it gets shutdown in CPU unplug and not fiddle with it > until it's restarted by the core when the CPU is plugged in again." > > However, current blk-mq implementation doesn't quiesce hw queue before > the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a > cpuhp state handled after the CPU is down, so there isn't any chance to > quiesce the hctx before shutting down the CPU. > > Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs > where the last CPU goes away, and wait for completion of in-flight > requests. This guarantees that there is no inflight I/O before shutting > down the managed IRQ. > > Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need > to wait for completion of in-flight requests from these drivers to avoid > a potential dead-lock. It is safe to do this for stacking drivers as those > do not use interrupts at all and their I/O completions are triggered by > underlying devices I/O completion. > > [1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@xxxxxxxxxxxxxxxxxxxxxxx/ > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > [hch: different retry mechanism, merged two patches, minor cleanups] > Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Daniel Wagner <dwagner@xxxxxxx>