> From: Max Gurtovoy [mailto:maxg@xxxxxxxxxxxx] > Sent: Thursday, February 22, 2018 1:04 PM > > This issue was fixed in commit: > > "mlx5: fix mlx5_get_vector_affinity to start from completion vector 0" > and would be added (probably to 4.15.5). > > please try it :) > That fixed the issue. Thanks for the quick response > -Max. > > On 2/22/2018 12:10 PM, Leon Romanovsky wrote: > > On Thu, Feb 22, 2018 at 09:10:00AM +0000, Kalderon, Michal wrote: > >> Hi Leon, Sagi, > >> > >> We're trying to run simple nvmf connect over connectx4 on kernel > >> 4.15.4 and we're hitting the following kernel panic on the initiator side. > >> Are there any known issues on this kernel? > >> > >> Server configuration > >> [root@lbtlvb-pcie157 linux-4.15.4]# nvmetcli /> ls > >> o- / > ..................................................................................................................... > .... [...] > >> o- hosts > ................................................................................................................... > [...] > >> o- ports > ................................................................................................................... > [...] > >> | o- 1 > ..................................................................................................................... > [...] > >> | o- referrals > ........................................................................................................... [...] > >> | o- subsystems > .......................................................................................................... [...] > >> | o- nvme-subsystem-tmp > ............................................................................................... [...] > >> o- subsystems > .............................................................................................................. [...] > >> o- nvme-subsystem-tmp > ................................................................................................... [...] > >> o- allowed_hosts > ....................................................................................................... [...] > >> o- namespaces > .......................................................................................................... [...] > >> o- 1 > >> ..................................................................... > >> ............................................ [...] > >> > >> Discovery is successful with the following command: > >> nvme discover -t rdma -a 192.168.20.157 -s 1023 Discovery Log Number > >> of Records 1, Generation counter 1 =====Discovery Log Entry 0====== > >> trtype: rdma > >> adrfam: ipv4 > >> subtype: nvme subsystem > >> treq: not specified > >> portid: 1 > >> trsvcid: 1023 > >> > >> subnqn: nvme-subsystem-tmp > >> traddr: 192.168.20.157 > >> > >> rdma_prtype: not specified > >> rdma_qptype: connected > >> rdma_cms: rdma-cm > >> rdma_pkey: 0x0000 > >> > >> > >> When running connect as follows, we get the kernel panic nvme connect > >> -t rdma -n nvme-subsystem-tmp -a 192.168.20.157 -s 1023 > >> > >> Please advise how to proceed. > >> > >> Thanks, > >> Michal > >> > >> [ 663.010545] nvme nvme0: new ctrl: NQN > >> "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.20.157:1023 [ > >> 663.010545] nvme nvme0: new ctrl: NQN "nqn.2014- > 08.org.nvmexpress.discovery", addr 192.168.20.157:1023 [ 663.052781] nvme > nvme0: creating 24 I/O queues. > >> [ 663.052781] nvme nvme0: creating 24 I/O queues. > >> [ 663.408093] nvme nvme0: Connect command failed, error wo/DNR bit: > >> -16402 [ 663.408093] nvme nvme0: Connect command failed, error > >> wo/DNR bit: -16402 [ 663.409116] nvme nvme0: failed to connect > >> queue: 3 ret=-18 [ 663.409116] nvme nvme0: failed to connect queue: > >> 3 ret=-18 > > > > I'm not NVMeF expert, but -18 error code means EXDEV and not many > > places in code can return this error, also it is negative => it is > > Linux's error and not NVMe. > > > > So based on 4.16-rc1 code, the flow is: > > nvme_rdma_start_queue -> > > nvmf_connect_io_queue -> > > __nvme_submit_sync_cmd -> > > nvme_alloc_request -> > > blk_mq_alloc_request_hctx -> > > > > 437 /* > > 438 * Check if the hardware context is actually mapped to anything. > > 439 * If not tell the caller that it should skip this queue. > > 440 */ > > 441 alloc_data.hctx = q->queue_hw_ctx[hctx_idx]; > > 442 if (!blk_mq_hw_queue_mapped(alloc_data.hctx)) { > > 443 blk_queue_exit(q); > > 444 return ERR_PTR(-EXDEV); > > 445 } > > > > Hope it helps. > > > > Thanks > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html