Re: [PATCH V9 00/11] blk-mq: improvement CPU hotplug

Ming Lei <ming.lei@xxxxxxxxxx> · Mon, 4 May 2020 16:38:31 +0800

On Mon, May 04, 2020 at 09:14:20AM +0100, John Garry wrote:
> On 03/05/2020 00:54, Ming Lei wrote:
> > Hi,
> > 
> > Thomas mentioned:
> >      "
> >       That was the constraint of managed interrupts from the very beginning:
> >        The driver/subsystem has to quiesce the interrupt line and the associated
> >        queue _before_ it gets shutdown in CPU unplug and not fiddle with it
> >        until it's restarted by the core when the CPU is plugged in again.
> >      "
> > 
> > But no drivers or blk-mq do that before one hctx becomes inactive(all
> > CPUs for one hctx are offline), and even it is worse, blk-mq stills tries
> > to run hw queue after hctx is dead, see blk_mq_hctx_notify_dead().
> > 
> > This patchset tries to address the issue by two stages:
> > 
> > 1) add one new cpuhp state of CPUHP_AP_BLK_MQ_ONLINE
> > 
> > - mark the hctx as internal stopped, and drain all in-flight requests
> > if the hctx is going to be dead.
> > 
> > 2) re-submit IO in the state of CPUHP_BLK_MQ_DEAD after the hctx becomes dead
> > 
> > - steal bios from the request, and resubmit them via generic_make_request(),
> > then these IO will be mapped to other live hctx for dispatch
> > 
> > Thanks John Garry for running lots of tests on arm64 with this patchset
> > and co-working on investigating all kinds of issues.
> > 
> > Thanks Christoph's review on V7.
> > 
> > Please comment & review, thanks!
> > 
> 
> Hi Ming,
> 
> Bad news, I see this on the first hotplug loop:
> 
> [66.628964] CPU2: shutdown
> [66.631663] psci: CPU2 killed (polled 0 ms)
> [66.681062] CPU3: shutdown
> [66.683761] psci: CPU3 killed (polled 0 ms)
> [66.717097] CPU4: shutdown
> [66.719796] psci: CPU4 killed (polled 0 ms)
> [66.753136] CPU5: shutdown
> [66.755834] psci: CPU5 killed (polled 0 ms)
> [66.801234] CPU6: shutdown=2746MiB/s,w=0KiB/s][r=703k,w=0 IOPS][eta 00m:55s]
> [66.803932] psci: CPU6 killed (polled 0 ms)
> [66.837170] irq_shutdown irq150
> [66.840410] CPU7: shutdown
> [66.843112] psci: CPU7 killed (polled 0 ms)
> [66.885394] CPU8: shutdown
> [66.888092] psci: CPU8 killed (polled 0 ms)
> [66.925431] CPU9: shutdown
> [66.928128] psci: CPU9 killed (polled 0 ms)
> [66.965526] CPU10: shutdown
> [66.968311] psci: CPU10 killed (polled 0 ms)
> [67.025569] irq_shutdown irq151
> [67.028808] CPU11: shutdown
> [67.031599] psci: CPU11 killed (polled 0 ms)
> [67.306393] CPU12: shutdown
> [67.309179] psci: CPU12 killed (polled 0 ms)
> [67.347266] CPU13: shutdown
> [67.350058] psci: CPU13 killed (polled 0 ms)
> [67.410910] CPU14: shutdown
> [67.413695] psci: CPU14 killed (polled 0 ms)
> [67.454453] irq_shutdown irq152
> [67.457699] CPU15: shutdown
> [67.457799] ------------[ cut here ]------------
> [67.460506] psci: CPU15 killed (polled 0 ms)
> [67.465091] refcount_t: underflow; use-after-free.
> [67.465112] WARNING: CPU: 0 PID: 557 at lib/refcount.c:28
> refcount_warn_saturate+0x6c/0x13c
> [67.482468] Modules linked in:
> [67.485511] CPU: 0 PID: 557 Comm: irq/149-hisi_sa Not tainted
> 5.7.0-rc2-13190-gb615db31b5f4 #331
> [67.494281] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21
> Nemo 2.0 RC0 04/18/2018
> [67.503398] pstate: 20000005 (nzCv daif -PAN -UAO)
> [67.508176] pc : refcount_warn_saturate+0x6c/0x13c
> [67.512954] lr : refcount_warn_saturate+0x6c/0x13c
> [67.517730] sp : ffff80001fd3b920
> [67.521031] x29: ffff80001fd3b920 x28: ffff001f9c549500
> [67.526329] x27: ffff001f9c54956c x26: 000000000000d8d0
> [67.531627] x25: ffff001fb08c7378 x24: 0000000000000000
> [67.536925] x23: ffff001f9a6eef00 x22: 0000000000000000
> [67.542223] x21: 0000000000001000 x20: ffff001f9ebd0918
> [67.547521] x19: ffff001f9a6eef00 x18: 0000000000000000
> [67.552820] x17: 000000007fffffff x16: 0000000000000311
> [67.558118] x15: 000000000001ffff x14: 0000000000000001
> [67.563416] x13: 000000000000001f x12: 0000000000000000
> [67.568714] x11: ffff800011aca468 x10: 0000000000fddfa0
> [67.574012] x9 : 0000000000000000 x8 : ffff041faa022098
> [67.579310] x7 : 0000000000000000 x6 : ffff001ffbe871d0
> [67.584608] x5 : 0000000000000001 x4 : 0000000000000000
> [67.589906] x3 : 0000000000000000 x2 : 0000000000000007
> [67.595204] x1 : 0000000100000001 x0 : 0000000000000026
> [67.600503] Call trace:
> [67.602937]  refcount_warn_saturate+0x6c/0x13c
> [67.607369]  aio_complete_rw+0x350/0x384
> [67.611279]  blkdev_bio_end_io+0xc4/0x12c
> [67.615276]  bio_endio+0x104/0x130
> [67.618665]  blk_update_request+0x98/0x37c
> [67.622748]  blk_mq_end_request+0x24/0x138
> [67.626831]  blk_mq_resubmit_end_rq+0x40/0x58
> [67.631174]  __blk_mq_end_request+0xb0/0x10c
> [67.635432]  scsi_end_request+0xdc/0x20c

Looks an old issue, I believe the following patch can fix the issue:

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 12dee4ecd5cc..3fc79d4b2fe0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2415,6 +2415,10 @@ static void blk_mq_resubmit_rq(struct request *rq)
 	nrq->bio = rq->bio;
 	nrq->biotail = rq->biotail;
 
+	/* Now all bios ownership is transfered to 'nrq' */
+	rq->bio = rq->biotail = NULL;
+	rq->__data_len = 0;
+
 	if (blk_insert_cloned_request(nrq->q, nrq) != BLK_STS_OK)
 		blk_mq_request_bypass_insert(nrq, false, true);
 }


Thanks,
Ming