Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 04/04/2018 09:22 PM, Sagi Grimberg wrote:


On 03/30/2018 12:32 PM, Yi Zhang wrote:
Hello
I got this kernel BUG on 4.16.0-rc7, here is the reproducer and log, let me know if you need more info, thanks.

Reproducer:
1. setup target
#nvmetcli restore /etc/rdma.json
2. connect target on host
#nvme connect-all -t rdma -a $IP -s 4420during my NVMeoF RDMA testing
3. do fio background on host
#fio -filename=/dev/nvme0n1 -iodepth=1 -thread -rw=randwrite -ioengine=psync -bssplit=5k/10:9k/10:13k/10:17k/10:21k/10:25k/10:29k/10:33k/10:37k/10:41k/10 -bs_unaligned -runtime=180 -size=-group_reporting -name=mytest -numjobs=60 &
4. offline cpu on host
#echo 0 > /sys/devices/system/cpu/cpu1/online
#echo 0 > /sys/devices/system/cpu/cpu2/online
#echo 0 > /sys/devices/system/cpu/cpu3/online
5. clear target
#nvmetcli clear
6. restore target
#nvmetcli restore /etc/rdma.json
7. check console log on host

Hi Yi,

Does this happen with this applied?
--
diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
index 996167f1de18..b89da55e8aaa 100644
--- a/block/blk-mq-rdma.c
+++ b/block/blk-mq-rdma.c
@@ -35,6 +35,8 @@ int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
        const struct cpumask *mask;
        unsigned int queue, cpu;

+       goto fallback;
+
        for (queue = 0; queue < set->nr_hw_queues; queue++) {
                mask = ib_get_vector_affinity(dev, first_vec + queue);
                if (!mask)
--


Hi Sagi

Still can reproduce this issue with the change:

[  133.469908] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.0.90:4420
[  133.554025] nvme nvme0: creating 40 I/O queues.
[  133.947648] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.0.90:4420
[  138.740870] smpboot: CPU 1 is now offline
[  138.778382] IRQ 37: no longer affine to CPU2
[  138.783153] IRQ 54: no longer affine to CPU2
[  138.787919] IRQ 70: no longer affine to CPU2
[  138.792687] IRQ 98: no longer affine to CPU2
[  138.797458] IRQ 140: no longer affine to CPU2
[  138.802319] IRQ 141: no longer affine to CPU2
[  138.807189] IRQ 166: no longer affine to CPU2
[  138.813622] smpboot: CPU 2 is now offline
[  139.043610] smpboot: CPU 3 is now offline
[  141.587283] print_req_error: operation not supported error, dev nvme0n1, sector 494622136 [  141.587303] print_req_error: operation not supported error, dev nvme0n1, sector 219643648 [  141.587304] print_req_error: operation not supported error, dev nvme0n1, sector 279256456 [  141.587306] print_req_error: operation not supported error, dev nvme0n1, sector 1208024 [  141.587322] print_req_error: operation not supported error, dev nvme0n1, sector 100575248 [  141.587335] print_req_error: operation not supported error, dev nvme0n1, sector 111717456 [  141.587346] print_req_error: operation not supported error, dev nvme0n1, sector 171939296 [  141.587348] print_req_error: operation not supported error, dev nvme0n1, sector 476420528 [  141.587353] print_req_error: operation not supported error, dev nvme0n1, sector 371566696 [  141.587356] print_req_error: operation not supported error, dev nvme0n1, sector 161758408 [  141.587463] Buffer I/O error on dev nvme0n1, logical block 54193430, lost async page write [  141.587472] Buffer I/O error on dev nvme0n1, logical block 54193431, lost async page write [  141.587478] Buffer I/O error on dev nvme0n1, logical block 54193432, lost async page write [  141.587483] Buffer I/O error on dev nvme0n1, logical block 54193433, lost async page write [  141.587532] Buffer I/O error on dev nvme0n1, logical block 54193476, lost async page write [  141.587534] Buffer I/O error on dev nvme0n1, logical block 54193477, lost async page write [  141.587536] Buffer I/O error on dev nvme0n1, logical block 54193478, lost async page write [  141.587538] Buffer I/O error on dev nvme0n1, logical block 54193479, lost async page write [  141.587540] Buffer I/O error on dev nvme0n1, logical block 54193480, lost async page write [  141.587542] Buffer I/O error on dev nvme0n1, logical block 54193481, lost async page write
[  142.573522] nvme nvme0: Reconnecting in 10 seconds...
[  146.587532] buffer_io_error: 3743628 callbacks suppressed
[  146.587534] Buffer I/O error on dev nvme0n1, logical block 64832757, lost async page write [  146.602837] Buffer I/O error on dev nvme0n1, logical block 64832758, lost async page write [  146.612091] Buffer I/O error on dev nvme0n1, logical block 64832759, lost async page write [  146.621346] Buffer I/O error on dev nvme0n1, logical block 64832760, lost async page write
[  146.630615] print_req_error: 556822 callbacks suppressed
[  146.630616] print_req_error: I/O error, dev nvme0n1, sector 518662176
[  146.643776] Buffer I/O error on dev nvme0n1, logical block 64832772, lost async page write [  146.653030] Buffer I/O error on dev nvme0n1, logical block 64832773, lost async page write [  146.662282] Buffer I/O error on dev nvme0n1, logical block 64832774, lost async page write
[  146.671542] print_req_error: I/O error, dev nvme0n1, sector 518662568
[  146.678754] Buffer I/O error on dev nvme0n1, logical block 64832821, lost async page write [  146.688003] Buffer I/O error on dev nvme0n1, logical block 64832822, lost async page write
[  146.697784] print_req_error: I/O error, dev nvme0n1, sector 518662928
[  146.705450] Buffer I/O error on dev nvme0n1, logical block 64832866, lost async page write
[  146.715176] print_req_error: I/O error, dev nvme0n1, sector 518665376
[  146.722920] print_req_error: I/O error, dev nvme0n1, sector 518666136
[  146.730602] print_req_error: I/O error, dev nvme0n1, sector 518666920
[  146.738275] print_req_error: I/O error, dev nvme0n1, sector 518667880
[  146.745944] print_req_error: I/O error, dev nvme0n1, sector 518668096
[  146.753605] print_req_error: I/O error, dev nvme0n1, sector 518668960
[  146.761249] print_req_error: I/O error, dev nvme0n1, sector 518669616
[  149.010303] nvme nvme0: Identify namespace failed
[  149.016171] Dev nvme0n1: unable to read RDB block 0
[  149.022017]  nvme0n1: unable to read partition table
[  149.032192] nvme nvme0: Identify namespace failed
[  149.037857] Dev nvme0n1: unable to read RDB block 0
[  149.043695]  nvme0n1: unable to read partition table
[  153.081673] nvme nvme0: creating 37 I/O queues.
[  153.384977] BUG: unable to handle kernel paging request at 00003a9ed053bd48
[  153.393197] IP: blk_mq_get_request+0x23e/0x390
[  153.398585] PGD 0 P4D 0
[  153.401841] Oops: 0002 [#1] SMP PTI
[  153.406168] Modules linked in: nvme_rdma nvme_fabrics nvme_core nvmet_rdma nvmet sch_mqprio ebtable_filter ebtables ip6table_filter ip6_tabt [  153.489688]  drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core ahci libahci crc32c_intel libata tg3 i2c_core dd [  153.509370] CPU: 32 PID: 689 Comm: kworker/u369:6 Not tainted 4.16.0-rc7.sagi+ #4 [  153.518417] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.6.2 01/08/2016
[  153.527486] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[  153.535695] RIP: 0010:blk_mq_get_request+0x23e/0x390
[  153.541973] RSP: 0018:ffffb8cc0853fca8 EFLAGS: 00010246
[  153.548530] RAX: 00003a9ed053bd00 RBX: ffff9e2cbbf30000 RCX: 000000000000001f [  153.557230] RDX: 0000000000000000 RSI: ffffffe19b5ba5d2 RDI: ffff9e2c90219000 [  153.565923] RBP: ffffb8cc0853fce8 R08: ffffffffffffffff R09: 0000000000000002 [  153.574628] R10: ffff9e1cbea27160 R11: fffff20780005c00 R12: 0000000000000023 [  153.583340] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [  153.592062] FS:  0000000000000000(0000) GS:ffff9e1cbea00000(0000) knlGS:0000000000000000
[  153.601846] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  153.609013] CR2: 00003a9ed053bd48 CR3: 00000014b560a003 CR4: 00000000001606e0
[  153.617732] Call Trace:
[  153.621221]  blk_mq_alloc_request_hctx+0xf2/0x140
[  153.627244]  nvme_alloc_request+0x36/0x60 [nvme_core]
[  153.633647]  __nvme_submit_sync_cmd+0x2b/0xd0 [nvme_core]
[  153.640429]  nvmf_connect_io_queue+0x10e/0x170 [nvme_fabrics]
[  153.647613]  nvme_rdma_start_queue+0x21/0x80 [nvme_rdma]
[  153.654300]  nvme_rdma_configure_io_queues+0x196/0x280 [nvme_rdma]
[  153.661947]  nvme_rdma_reconnect_ctrl_work+0x39/0xd0 [nvme_rdma]
[  153.669394]  process_one_work+0x158/0x360
[  153.674618]  worker_thread+0x47/0x3e0
[  153.679458]  kthread+0xf8/0x130
[  153.683717]  ? max_active_store+0x80/0x80
[  153.688952]  ? kthread_bind+0x10/0x10
[  153.693809]  ret_from_fork+0x35/0x40
[  153.698569] Code: 89 83 40 01 00 00 45 84 e4 48 c7 83 48 01 00 00 00 00 00 00 ba 01 00 00 00 48 8b 45 10 74 0c 31 d2 41 f7 c4 00 08 06 00 0
[  153.721261] RIP: blk_mq_get_request+0x23e/0x390 RSP: ffffb8cc0853fca8
[  153.729264] CR2: 00003a9ed053bd48
[  153.733833] ---[ end trace f77c1388aba74f1c ]---

_______________________________________________
Linux-nvme mailing list
Linux-nvme@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-nvme




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux