On Thu, Mar 29, 2018 at 09:23:10AM +0200, Christian Borntraeger wrote: > > > On 03/29/2018 04:00 AM, Ming Lei wrote: > > On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: > >> > >> > >> On 03/28/2018 05:26 PM, Ming Lei wrote: > >>> Hi Christian, > >>> > >>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: > >>>> FWIW, this patch does not fix the issue for me: > >>>> > >>>> ostname=? addr=? terminal=? res=success' > >>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 > >>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 > >>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 > >>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) > >>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) > >>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > >>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 > >>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 > >>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 > >>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 > >>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 > >>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 > >>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) > >>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>> 000000000069c5b2: 07f4 bcr 15,%r4 > >>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 > >>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 > >>>> [ 21.455067] Call Trace: > >>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) > >>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 > >>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 > >>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 > >>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 > >>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 > >>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 > >>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 > >>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 > >>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 > >>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 > >>>> [ 21.455136] Last Breaking-Event-Address: > >>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 > >>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- > >>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring > >>> > >>> Thinking about this issue further, I can't understand the root cause for > >>> this issue. > >>> > >>> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with > >>> each possisble CPU"), each hw queue should be mapped to at least one CPU, that > >>> means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong? > >>> > >>> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the > >>> following command? > >> > >> # lscpu > >> Architecture: s390x > >> CPU op-mode(s): 32-bit, 64-bit > >> Byte Order: Big Endian > >> CPU(s): 16 > >> On-line CPU(s) list: 0-15 > >> Thread(s) per core: 2 > >> Core(s) per socket: 8 > >> Socket(s) per book: 3 > >> Book(s) per drawer: 2 > >> Drawer(s): 4 > >> NUMA node(s): 1 > >> Vendor ID: IBM/S390 > >> Machine type: 2964 > >> CPU dynamic MHz: 5000 > >> CPU static MHz: 5000 > >> BogoMIPS: 20325.00 > >> Hypervisor: PR/SM > >> Hypervisor vendor: IBM > >> Virtualization type: full > >> Dispatching mode: horizontal > >> L1d cache: 128K > >> L1i cache: 96K > >> L2d cache: 2048K > >> L2i cache: 2048K > >> L3 cache: 65536K > >> L4 cache: 491520K > >> NUMA node0 CPU(s): 0-15 > >> Flags: esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie > >> > >> # lsdasd > >> Bus-ID Status Name Device Type BlkSz Size Blocks > >> ============================================================================== > >> 0.0.3f75 active dasda 94:0 ECKD 4096 21129MB 5409180 > >> 0.0.3f76 active dasdb 94:4 ECKD 4096 21129MB 5409180 > >> 0.0.3f77 active dasdc 94:8 ECKD 4096 21129MB 5409180 > >> 0.0.3f74 active dasdd 94:12 ECKD 4096 21129MB 5409180 > > > > I have tried to emulate your CPU topo via VM and the blk-mq mapping of > > null_blk is basically similar with your DASD mapping, but still can't > > reproduce your issue. > > > > BTW, do you need to do cpu hotplug or other actions for triggering this warning? > > No, without hotplug. >From the debugfs log, hctx0 is mapped to lots of CPU, so it shouldn't be unmapped, could you check if it is hctx0 which is unmapped when the warning is triggered? If not, what is the unmapped hctx? And you can do that by adding one extra line: printk("unmapped hctx %d", hctx->queue_num); Thanks, Ming