On 04/04/2018 09:22 PM, Sagi Grimberg wrote:
On 03/30/2018 12:32 PM, Yi Zhang wrote:
Hello
I got this kernel BUG on 4.16.0-rc7, here is the reproducer and log,
let me know if you need more info, thanks.
Reproducer:
1. setup target
#nvmetcli restore /etc/rdma.json
2. connect target on host
#nvme connect-all -t rdma -a $IP -s 4420during my NVMeoF RDMA testing
3. do fio background on host
#fio -filename=/dev/nvme0n1 -iodepth=1 -thread -rw=randwrite
-ioengine=psync
-bssplit=5k/10:9k/10:13k/10:17k/10:21k/10:25k/10:29k/10:33k/10:37k/10:41k/10
-bs_unaligned -runtime=180 -size=-group_reporting -name=mytest
-numjobs=60 &
4. offline cpu on host
#echo 0 > /sys/devices/system/cpu/cpu1/online
#echo 0 > /sys/devices/system/cpu/cpu2/online
#echo 0 > /sys/devices/system/cpu/cpu3/online
5. clear target
#nvmetcli clear
6. restore target
#nvmetcli restore /etc/rdma.json
7. check console log on host
Hi Yi,
Does this happen with this applied?
--
diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
index 996167f1de18..b89da55e8aaa 100644
--- a/block/blk-mq-rdma.c
+++ b/block/blk-mq-rdma.c
@@ -35,6 +35,8 @@ int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
const struct cpumask *mask;
unsigned int queue, cpu;
+ goto fallback;
+
for (queue = 0; queue < set->nr_hw_queues; queue++) {
mask = ib_get_vector_affinity(dev, first_vec + queue);
if (!mask)
--
Hi Sagi
Still can reproduce this issue with the change:
[ 133.469908] nvme nvme0: new ctrl: NQN
"nqn.2014-08.org.nvmexpress.discovery", addr 172.31.0.90:4420
[ 133.554025] nvme nvme0: creating 40 I/O queues.
[ 133.947648] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.0.90:4420
[ 138.740870] smpboot: CPU 1 is now offline
[ 138.778382] IRQ 37: no longer affine to CPU2
[ 138.783153] IRQ 54: no longer affine to CPU2
[ 138.787919] IRQ 70: no longer affine to CPU2
[ 138.792687] IRQ 98: no longer affine to CPU2
[ 138.797458] IRQ 140: no longer affine to CPU2
[ 138.802319] IRQ 141: no longer affine to CPU2
[ 138.807189] IRQ 166: no longer affine to CPU2
[ 138.813622] smpboot: CPU 2 is now offline
[ 139.043610] smpboot: CPU 3 is now offline
[ 141.587283] print_req_error: operation not supported error, dev
nvme0n1, sector 494622136
[ 141.587303] print_req_error: operation not supported error, dev
nvme0n1, sector 219643648
[ 141.587304] print_req_error: operation not supported error, dev
nvme0n1, sector 279256456
[ 141.587306] print_req_error: operation not supported error, dev
nvme0n1, sector 1208024
[ 141.587322] print_req_error: operation not supported error, dev
nvme0n1, sector 100575248
[ 141.587335] print_req_error: operation not supported error, dev
nvme0n1, sector 111717456
[ 141.587346] print_req_error: operation not supported error, dev
nvme0n1, sector 171939296
[ 141.587348] print_req_error: operation not supported error, dev
nvme0n1, sector 476420528
[ 141.587353] print_req_error: operation not supported error, dev
nvme0n1, sector 371566696
[ 141.587356] print_req_error: operation not supported error, dev
nvme0n1, sector 161758408
[ 141.587463] Buffer I/O error on dev nvme0n1, logical block 54193430,
lost async page write
[ 141.587472] Buffer I/O error on dev nvme0n1, logical block 54193431,
lost async page write
[ 141.587478] Buffer I/O error on dev nvme0n1, logical block 54193432,
lost async page write
[ 141.587483] Buffer I/O error on dev nvme0n1, logical block 54193433,
lost async page write
[ 141.587532] Buffer I/O error on dev nvme0n1, logical block 54193476,
lost async page write
[ 141.587534] Buffer I/O error on dev nvme0n1, logical block 54193477,
lost async page write
[ 141.587536] Buffer I/O error on dev nvme0n1, logical block 54193478,
lost async page write
[ 141.587538] Buffer I/O error on dev nvme0n1, logical block 54193479,
lost async page write
[ 141.587540] Buffer I/O error on dev nvme0n1, logical block 54193480,
lost async page write
[ 141.587542] Buffer I/O error on dev nvme0n1, logical block 54193481,
lost async page write
[ 142.573522] nvme nvme0: Reconnecting in 10 seconds...
[ 146.587532] buffer_io_error: 3743628 callbacks suppressed
[ 146.587534] Buffer I/O error on dev nvme0n1, logical block 64832757,
lost async page write
[ 146.602837] Buffer I/O error on dev nvme0n1, logical block 64832758,
lost async page write
[ 146.612091] Buffer I/O error on dev nvme0n1, logical block 64832759,
lost async page write
[ 146.621346] Buffer I/O error on dev nvme0n1, logical block 64832760,
lost async page write
[ 146.630615] print_req_error: 556822 callbacks suppressed
[ 146.630616] print_req_error: I/O error, dev nvme0n1, sector 518662176
[ 146.643776] Buffer I/O error on dev nvme0n1, logical block 64832772,
lost async page write
[ 146.653030] Buffer I/O error on dev nvme0n1, logical block 64832773,
lost async page write
[ 146.662282] Buffer I/O error on dev nvme0n1, logical block 64832774,
lost async page write
[ 146.671542] print_req_error: I/O error, dev nvme0n1, sector 518662568
[ 146.678754] Buffer I/O error on dev nvme0n1, logical block 64832821,
lost async page write
[ 146.688003] Buffer I/O error on dev nvme0n1, logical block 64832822,
lost async page write
[ 146.697784] print_req_error: I/O error, dev nvme0n1, sector 518662928
[ 146.705450] Buffer I/O error on dev nvme0n1, logical block 64832866,
lost async page write
[ 146.715176] print_req_error: I/O error, dev nvme0n1, sector 518665376
[ 146.722920] print_req_error: I/O error, dev nvme0n1, sector 518666136
[ 146.730602] print_req_error: I/O error, dev nvme0n1, sector 518666920
[ 146.738275] print_req_error: I/O error, dev nvme0n1, sector 518667880
[ 146.745944] print_req_error: I/O error, dev nvme0n1, sector 518668096
[ 146.753605] print_req_error: I/O error, dev nvme0n1, sector 518668960
[ 146.761249] print_req_error: I/O error, dev nvme0n1, sector 518669616
[ 149.010303] nvme nvme0: Identify namespace failed
[ 149.016171] Dev nvme0n1: unable to read RDB block 0
[ 149.022017] nvme0n1: unable to read partition table
[ 149.032192] nvme nvme0: Identify namespace failed
[ 149.037857] Dev nvme0n1: unable to read RDB block 0
[ 149.043695] nvme0n1: unable to read partition table
[ 153.081673] nvme nvme0: creating 37 I/O queues.
[ 153.384977] BUG: unable to handle kernel paging request at
00003a9ed053bd48
[ 153.393197] IP: blk_mq_get_request+0x23e/0x390
[ 153.398585] PGD 0 P4D 0
[ 153.401841] Oops: 0002 [#1] SMP PTI
[ 153.406168] Modules linked in: nvme_rdma nvme_fabrics nvme_core
nvmet_rdma nvmet sch_mqprio ebtable_filter ebtables ip6table_filter ip6_tabt
[ 153.489688] drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops ttm drm mlx4_core ahci libahci crc32c_intel libata tg3
i2c_core dd
[ 153.509370] CPU: 32 PID: 689 Comm: kworker/u369:6 Not tainted
4.16.0-rc7.sagi+ #4
[ 153.518417] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS
1.6.2 01/08/2016
[ 153.527486] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[ 153.535695] RIP: 0010:blk_mq_get_request+0x23e/0x390
[ 153.541973] RSP: 0018:ffffb8cc0853fca8 EFLAGS: 00010246
[ 153.548530] RAX: 00003a9ed053bd00 RBX: ffff9e2cbbf30000 RCX:
000000000000001f
[ 153.557230] RDX: 0000000000000000 RSI: ffffffe19b5ba5d2 RDI:
ffff9e2c90219000
[ 153.565923] RBP: ffffb8cc0853fce8 R08: ffffffffffffffff R09:
0000000000000002
[ 153.574628] R10: ffff9e1cbea27160 R11: fffff20780005c00 R12:
0000000000000023
[ 153.583340] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[ 153.592062] FS: 0000000000000000(0000) GS:ffff9e1cbea00000(0000)
knlGS:0000000000000000
[ 153.601846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 153.609013] CR2: 00003a9ed053bd48 CR3: 00000014b560a003 CR4:
00000000001606e0
[ 153.617732] Call Trace:
[ 153.621221] blk_mq_alloc_request_hctx+0xf2/0x140
[ 153.627244] nvme_alloc_request+0x36/0x60 [nvme_core]
[ 153.633647] __nvme_submit_sync_cmd+0x2b/0xd0 [nvme_core]
[ 153.640429] nvmf_connect_io_queue+0x10e/0x170 [nvme_fabrics]
[ 153.647613] nvme_rdma_start_queue+0x21/0x80 [nvme_rdma]
[ 153.654300] nvme_rdma_configure_io_queues+0x196/0x280 [nvme_rdma]
[ 153.661947] nvme_rdma_reconnect_ctrl_work+0x39/0xd0 [nvme_rdma]
[ 153.669394] process_one_work+0x158/0x360
[ 153.674618] worker_thread+0x47/0x3e0
[ 153.679458] kthread+0xf8/0x130
[ 153.683717] ? max_active_store+0x80/0x80
[ 153.688952] ? kthread_bind+0x10/0x10
[ 153.693809] ret_from_fork+0x35/0x40
[ 153.698569] Code: 89 83 40 01 00 00 45 84 e4 48 c7 83 48 01 00 00 00
00 00 00 ba 01 00 00 00 48 8b 45 10 74 0c 31 d2 41 f7 c4 00 08 06 00 0
[ 153.721261] RIP: blk_mq_get_request+0x23e/0x390 RSP: ffffb8cc0853fca8
[ 153.729264] CR2: 00003a9ed053bd48
[ 153.733833] ---[ end trace f77c1388aba74f1c ]---
_______________________________________________
Linux-nvme mailing list
Linux-nvme@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-nvme