On Mon, Jan 20, 2020 at 9:24 PM John Garry <john.garry@xxxxxxxxxx> wrote: > > On 15/01/2020 17:00, John Garry wrote: > > On 15/01/2020 11:44, Ming Lei wrote: > >> Hi, > >> > >> Thomas mentioned: > >> " > >> That was the constraint of managed interrupts from the very > >> beginning: > >> The driver/subsystem has to quiesce the interrupt line and the > >> associated > >> queue _before_ it gets shutdown in CPU unplug and not fiddle > >> with it > >> until it's restarted by the core when the CPU is plugged in again. > >> " > >> > >> But no drivers or blk-mq do that before one hctx becomes inactive(all > >> CPUs for one hctx are offline), and even it is worse, blk-mq stills tries > >> to run hw queue after hctx is dead, see blk_mq_hctx_notify_dead(). > >> > >> This patchset tries to address the issue by two stages: > >> > >> 1) add one new cpuhp state of CPUHP_AP_BLK_MQ_ONLINE > >> > >> - mark the hctx as internal stopped, and drain all in-flight requests > >> if the hctx is going to be dead. > >> > >> 2) re-submit IO in the state of CPUHP_BLK_MQ_DEAD after the hctx > >> becomes dead > >> > >> - steal bios from the request, and resubmit them via > >> generic_make_request(), > >> then these IO will be mapped to other live hctx for dispatch > >> > >> Please comment & review, thanks! > >> > >> V5: > >> - rename BLK_MQ_S_INTERNAL_STOPPED as BLK_MQ_S_INACTIVE > >> - re-factor code for re-submit requests in cpu dead hotplug handler > >> - address requeue corner case > >> > >> V4: > >> - resubmit IOs in dispatch list in case that this hctx is dead > >> > >> V3: > >> - re-organize patch 2 & 3 a bit for addressing Hannes's comment > >> - fix patch 4 for avoiding potential deadlock, as found by Hannes > >> > >> V2: > >> - patch4 & patch 5 in V1 have been merged to block tree, so remove > >> them > >> - address comments from John Garry and Minwoo > >> > >> > > Hi Ming, > > Colleague Xiang Chen found this: > > Starting 40 processes > Jobs: 40 (f=40) > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [8.3% done] > [2899MB/0KB/0KB /s] [742K/0/0 iops] [eta 00m:33s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [11.1% done] > [2892MB/0KB/0KB /s] [740K/0/0 iops] [eta 00m:32s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [13.9% done] > [2887MB/0KB/0KB /s] [739K/0/0 iops] [eta 00m:31s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [16.7% done] > [2850MB/0KB/0KB /s] [730K/0/0 iops] [eta 00m:30s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [19.4% done] > [2834MB/0KB/0KB /s] [726K/0/0 iops] [eta 00m:29s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [22.2% done] > [2827MB/0KB/0KB /s] [724K/0/0 iops] [eta 00m:28s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [25.0% done] > [2804MB/0KB/0KB /s] [718K/0/0 iops] [eta 00m:27s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [27.8% done] > [2783MB/0KB/0KB /s] [713K/0/0 iops] [eta 00m:26s] > [ 112.990292] CPU16: shutdown > [ 112.993113] psci: CPU16 killed (polled 0 ms) > [ 113.029429] CPU15: shutdown > [ 113.032245] psci: CPU15 killed (polled 0 ms) > [ 113.073649] CPU14: shutdown > [ 113.076461] psci: CPU14 killed (polled 0 ms) > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRR[ 113.121258] CPU13: > shutdown > RRRRRRRRRR] [30.[ 113.124247] psci: CPU13 killed (polled 0 ms) > 6% done] [2772MB/0KB/0KB /s] [710K/0/0 iops] [eta 00m:25s] > [ 113.177108] IRQ 600: no longer affine to CPU12 > [ 113.183185] CPU12: shutdown > [ 113.186001] psci: CPU12 killed (polled 0 ms) > [ 113.242555] CPU11: shutdown > [ 113.245368] psci: CPU11 killed (polled 0 ms) > [ 113.302598] CPU10: shutdown > [ 113.305420] psci: CPU10 killed (polled 0 ms) > [ 113.365757] CPU9: shutdown > [ 113.368489] psci: CPU9 killed (polled 0 ms) > [ 113.611904] IRQ 599: no longer affine to CPU8 > [ 113.620611] CPU8: shutdown > [ 113.623340] psci: CPU8 killed (polled 0 ms) > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [33.3% done] > [2765MB/0KB/0KB /s] [708K/0/0 iops] [eta 00m:24s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [36.1% done] > [2767MB/0KB/0KB /s] [708K/0/0 iops] [eta 00m:23s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [38.9% done] > [2771MB/0KB/0KB /s] [709K/0/0 iops] [eta 00m:22s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [41.7% done] > [2780MB/0KB/0KB /s] [712K/0/0 iops] [eta 00m:21s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [44.4% done] > [2731MB/0KB/0KB /s] [699K/0/0 iops] [eta 00m:20s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [47.2% done] > [2757MB/0KB/0KB /s] [706K/0/0 iops] [eta 00m:19s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [50.0% done] > [2764MB/0KB/0KB /s] [708K/0/0 iops] [eta 00m:18s] > [ 120.762141] ------------[ cut here ]------------ > [ 120.766760] WARNING: CPU: 0 PID: 10 at block/blk.h:299 > generic_make_request_checks+0x480/0x4e8 > [ 120.775348] Modules linked in: > [ 120.778397] CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 5.5.0- > > Then this: > > 1158048319d:18h:40m:34s] > Jobs: 40 (f=40): [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] [0.0% done] > [0KB/0KB/0KB /s] [0/0/0 iops] [eta 1158049605d:23h:03m:30s] > [ 141.915198] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 141.921280] rcu: 96-...0: (4 GPs behind) idle=af6/0/0x1 > softirq=90/90 fqs=2626 > [ 141.928658] rcu: 99-...0: (2 GPs behind) > idle=b82/1/0x4000000000000000 softirq=82/82 fqs=2627 > [ 141.937339] (detected by 17, t=5254 jiffies, g=3925, q=545) > [ 141.942985] Task dump for CPU 96: > [ 141.946292] swapper/96 R running task 0 0 1 > 0x0000002a > [ 141.953321] Call trace: > [ 141.955763] __switch_to+0xbc/0x218 > [ 141.959244] 0x0 > [ 141.961079] Task dump for CPU 99: > [ 141.964385] kworker/99:1H R running task 0 3473 2 > 0x0000022a > [ 141.971417] Workqueue: kblockd blk_mq_run_work_fn > [ 141.976109] Call trace: > [ 141.978550] __switch_to+0xbc/0x218 > [ 141.982029] blk_mq_run_work_fn+0x1c/0x28 > [ 141.986027] process_one_work+0x1e0/0x358 > [ 141.990025] worker_thread+0x40/0x488 > [ 141.993678] kthread+0x118/0x120 > [ 141.996897] ret_from_fork+0x10/0x18 Hi John, Thanks for your test! Could you test the following patchset and only the last one is changed? https://github.com/ming1/linux/commits/my_for_5.6_block Thanks, Ming Lei