On Thu, Feb 22, 2018 at 09:10:00AM +0000, Kalderon, Michal wrote: > Hi Leon, Sagi, > > We're trying to run simple nvmf connect over connectx4 on kernel 4.15.4 and we're hitting the following > kernel panic on the initiator side. > Are there any known issues on this kernel? > > Server configuration > [root@lbtlvb-pcie157 linux-4.15.4]# nvmetcli > /> ls > o- / ......................................................................................................................... [...] > o- hosts ................................................................................................................... [...] > o- ports ................................................................................................................... [...] > | o- 1 ..................................................................................................................... [...] > | o- referrals ........................................................................................................... [...] > | o- subsystems .......................................................................................................... [...] > | o- nvme-subsystem-tmp ............................................................................................... [...] > o- subsystems .............................................................................................................. [...] > o- nvme-subsystem-tmp ................................................................................................... [...] > o- allowed_hosts ....................................................................................................... [...] > o- namespaces .......................................................................................................... [...] > o- 1 ................................................................................................................. [...] > > Discovery is successful with the following command: > nvme discover -t rdma -a 192.168.20.157 -s 1023 > Discovery Log Number of Records 1, Generation counter 1 > =====Discovery Log Entry 0====== > trtype: rdma > adrfam: ipv4 > subtype: nvme subsystem > treq: not specified > portid: 1 > trsvcid: 1023 > > subnqn: nvme-subsystem-tmp > traddr: 192.168.20.157 > > rdma_prtype: not specified > rdma_qptype: connected > rdma_cms: rdma-cm > rdma_pkey: 0x0000 > > > When running connect as follows, we get the kernel panic > nvme connect -t rdma -n nvme-subsystem-tmp -a 192.168.20.157 -s 1023 > > Please advise how to proceed. > > Thanks, > Michal > > [ 663.010545] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.20.157:1023 > [ 663.010545] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.20.157:1023 > [ 663.052781] nvme nvme0: creating 24 I/O queues. > [ 663.052781] nvme nvme0: creating 24 I/O queues. > [ 663.408093] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 > [ 663.408093] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 > [ 663.409116] nvme nvme0: failed to connect queue: 3 ret=-18 > [ 663.409116] nvme nvme0: failed to connect queue: 3 ret=-18 I'm not NVMeF expert, but -18 error code means EXDEV and not many places in code can return this error, also it is negative => it is Linux's error and not NVMe. So based on 4.16-rc1 code, the flow is: nvme_rdma_start_queue -> nvmf_connect_io_queue -> __nvme_submit_sync_cmd -> nvme_alloc_request -> blk_mq_alloc_request_hctx -> 437 /* 438 * Check if the hardware context is actually mapped to anything. 439 * If not tell the caller that it should skip this queue. 440 */ 441 alloc_data.hctx = q->queue_hw_ctx[hctx_idx]; 442 if (!blk_mq_hw_queue_mapped(alloc_data.hctx)) { 443 blk_queue_exit(q); 444 return ERR_PTR(-EXDEV); 445 } Hope it helps. Thanks
Attachment:
signature.asc
Description: PGP signature