Re: nvme-rdma and rdma comp vector affinity problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Sagi and Christoph,

Do you all have any thoughts on this?  It seems like a bug in nvme-rdma
or the blk-mq code.   I can debug it further, if we agree this does look
like a bug...

Thanks,

Steve.


On 7/9/2018 2:25 PM, Steve Wise wrote:
> Hey Sagi,
>
> I'm adding cxgb4 support for ib_get_vector_affinity(), and I see an
> error when connecting via nvme-rdma for certain affinity settings for my
> comp vectors.  The error I see is:
>
> [root@stevo1 linux]# nvme connect-all -t rdma -a 172.16.2.1
> Failed to write to /dev/nvme-fabrics: Invalid cross-device link
>
> And this gets logged:
>
> [  590.357506] nvme nvme0: Connect command failed, error wo/DNR bit: -16402
> [  590.364730] nvme nvme0: failed to connect queue: 2 ret=-18
>
> The EXDEV error is being returned by blk_mq_alloc_request_hctx() because
> blk_mq_queue_mapped() fails.  This only happens when I setup my vector
> affinity such that there is overlap.  IE if 2 comp vectors are setup to
> the same cpu then i see this failure.  If they are all mapped each to
> their own cpu, then it works.  I added some debug in my cxgb4
> get_comp_vector_affinity(), and a WARN_ONCE() in
> blk_mq_alloc_request_hctx() and below is the output.
>
> I would think that the vector affinity shouldn't cause connection
> failures.  Any ideas?  Thanks!
>
> Steve.
>
> [  433.528743] nvmet: creating controller 1 for subsystem
> nqn.2014-08.org.nvmexpress.discovery for NQN
> nqn.2014-08.org.nvmexpress:uuid:228c41cb-86c1-4aca-8a10-6e8d8c7998a0.
> [  433.545267] nvme nvme0: new ctrl: NQN
> "nqn.2014-08.org.nvmexpress.discovery", addr 172.16.2.1:4420
> [  433.554972] nvme nvme0: Removing ctrl: NQN
> "nqn.2014-08.org.nvmexpress.discovery"
> [  433.604610] nvmet: creating controller 1 for subsystem nvme-nullb0
> for NQN
> nqn.2014-08.org.nvmexpress:uuid:228c41cb-86c1-4aca-8a10-6e8d8c7998a0.
> [  433.619048] nvme nvme0: creating 16 I/O queues.
> [  433.643746] iw_cxgb4: comp_vector 0, irq 217 mask 0x100
> [  433.649630] iw_cxgb4: comp_vector 1, irq 218 mask 0x200
> [  433.655501] iw_cxgb4: comp_vector 2, irq 219 mask 0x400
> [  433.661379] iw_cxgb4: comp_vector 3, irq 220 mask 0x800
> [  433.667243] iw_cxgb4: comp_vector 4, irq 221 mask 0x1000
> [  433.673179] iw_cxgb4: comp_vector 5, irq 222 mask 0x2000
> [  433.679110] iw_cxgb4: comp_vector 6, irq 223 mask 0x4000
> [  433.685020] iw_cxgb4: comp_vector 7, irq 224 mask 0x8000
> [  433.690928] iw_cxgb4: comp_vector 8, irq 225 mask 0x100
> [  433.696736] iw_cxgb4: comp_vector 9, irq 226 mask 0x200
> [  433.702531] iw_cxgb4: comp_vector 10, irq 227 mask 0x400
> [  433.708401] iw_cxgb4: comp_vector 11, irq 228 mask 0x800
> [  433.714277] iw_cxgb4: comp_vector 12, irq 229 mask 0x1000
> [  433.720208] iw_cxgb4: comp_vector 13, irq 230 mask 0x2000
> [  433.726138] iw_cxgb4: comp_vector 14, irq 231 mask 0x4000
> [  433.732051] iw_cxgb4: comp_vector 15, irq 232 mask 0x8000
> [  433.739894] ------------[ cut here ]------------
> [  433.745026] blk_mq_alloc_request_hctx hw_queue not mapped!
> [  433.751030] WARNING: CPU: 6 PID: 9950 at block/blk-mq.c:454
> blk_mq_alloc_request_hctx+0x163/0x180
> [  433.760396] Modules linked in: nvmet_rdma nvmet null_blk nvme_rdma
> nvme_fabrics nvme_core mlx5_ib mlx5_core mlxfw rdma_ucm ib_uverbs
> iw_cxgb4 rdma_cm iw_cm ib_cm ib_core cxgb4 iscsi_target_mod libiscsi
> scsi_transport_iscsi target_core_mod libcxgb vfat fat intel_rapl sb_edac
> x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
> crypto_simd cryptd glue_helper iTCO_wdt iTCO_vendor_support mxm_wmi
> pcspkr joydev mei_me devlink ipmi_si sg mei i2c_i801 ipmi_devintf
> lpc_ich ioatdma ipmi_msghandler wmi nfsd auth_rpcgss nfs_acl lockd grace
> sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper
> syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm isci libsas igb
> ahci scsi_transport_sas libahci libata crc32c_intel dca i2c_algo_bit
> [  433.835278]  i2c_core [last unloaded: mlxfw]
> [  433.840150] CPU: 6 PID: 9950 Comm: nvme Kdump: loaded Tainted:
> G        W         4.18.0-rc1+ #131
> [  433.849714] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a
> 07/09/2015
> [  433.857301] RIP: 0010:blk_mq_alloc_request_hctx+0x163/0x180
> [  433.863493] Code: 0f 0b 48 c7 c0 ea ff ff ff e9 1a ff ff ff 48 c7 c6
> e0 34 c8 bd 48 c7 c7 bb e4 ea bd 31 c0 c6 05 bc d1 e8 00 01 e8 bd 96 d1
> ff <0f> 0b 48 c7 c0 ee ff ff ff e9 f0 fe ff ff 0f 1f 44 00 00 66 2e 0f
> [  433.883625] RSP: 0018:ffffab7f4790bba8 EFLAGS: 00010286
> [  433.889481] RAX: 0000000000000000 RBX: ffff918412ab9360 RCX:
> 0000000000000000
> [  433.897252] RDX: 0000000000000001 RSI: ffff91841fd96978 RDI:
> ffff91841fd96978
> [  433.905014] RBP: 0000000000000001 R08: 0000000000000000 R09:
> 000000000000057d
> [  433.912782] R10: 00000000000003ff R11: 0000000000aaaaaa R12:
> 0000000000000023
> [  433.920555] R13: ffffab7f4790bc50 R14: 0000000000000400 R15:
> 0000000000000000
> [  433.928325] FS:  00007f54566d6780(0000) GS:ffff91841fd80000(0000)
> knlGS:0000000000000000
> [  433.937040] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  433.943418] CR2: 00007f5456000610 CR3: 0000000858f58003 CR4:
> 00000000000606e0
> [  433.951178] Call Trace:
> [  433.954241]  nvme_alloc_request+0x36/0x80 [nvme_core]
> [  433.959891]  __nvme_submit_sync_cmd+0x2b/0xd0 [nvme_core]
> [  433.965884]  nvmf_connect_io_queue+0x10e/0x170 [nvme_fabrics]
> [  433.972215]  nvme_rdma_start_queue+0x21/0x80 [nvme_rdma]
> [  433.978100]  nvme_rdma_configure_io_queues+0x196/0x280 [nvme_rdma]
> [  433.984846]  nvme_rdma_create_ctrl+0x4f9/0x640 [nvme_rdma]
> [  433.990901]  nvmf_dev_write+0x954/0xaf8 [nvme_fabrics]
> [  433.996614]  __vfs_write+0x33/0x190
> [  434.000681]  ? list_lru_add+0x97/0x140
> [  434.005015]  ? __audit_syscall_entry+0xd7/0x160
> [  434.010135]  vfs_write+0xad/0x1a0
> [  434.014039]  ksys_write+0x52/0xc0
> [  434.017959]  do_syscall_64+0x55/0x180
> [  434.022222]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  434.027880] RIP: 0033:0x7f5455fda840
> [  434.032061] Code: 73 01 c3 48 8b 0d 48 26 2d 00 f7 d8 64 89 01 48 83
> c8 ff c3 66 0f 1f 44 00 00 83 3d 3d 87 2d 00 00 75 10 b8 01 00 00 00 0f
> 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce c6 01 00 48 89 04 24
> [  434.052217] RSP: 002b:00007ffc930111e8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000001
> [  434.060449] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
> 00007f5455fda840
> [  434.068266] RDX: 000000000000003d RSI: 00007ffc93012260 RDI:
> 0000000000000003
> [  434.076088] RBP: 00007ffc93012260 R08: 00007f5455f39988 R09:
> 000000000000000d
> [  434.083911] R10: 0000000000000004 R11: 0000000000000246 R12:
> 000000000000003d
> [  434.091736] R13: 0000000000000003 R14: 0000000000000001 R15:
> 0000000000000001
> [  434.099555] ---[ end trace 9f5bec6eef77fae9 ]---
> [  434.104864] nvme nvme0: Connect command failed, error wo/DNR bit: -16402
> [  434.112235] nvme nvme0: failed to connect queue: 2 ret=-18
>
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-nvme

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux