This issue was fixed in commit:
"mlx5: fix mlx5_get_vector_affinity to start from completion vector 0"
and would be added (probably to 4.15.5).
please try it :)
-Max.
On 2/22/2018 12:10 PM, Leon Romanovsky wrote:
On Thu, Feb 22, 2018 at 09:10:00AM +0000, Kalderon, Michal wrote:
Hi Leon, Sagi,
We're trying to run simple nvmf connect over connectx4 on kernel 4.15.4 and we're hitting the following
kernel panic on the initiator side.
Are there any known issues on this kernel?
Server configuration
[root@lbtlvb-pcie157 linux-4.15.4]# nvmetcli
/> ls
o- / ......................................................................................................................... [...]
o- hosts ................................................................................................................... [...]
o- ports ................................................................................................................... [...]
| o- 1 ..................................................................................................................... [...]
| o- referrals ........................................................................................................... [...]
| o- subsystems .......................................................................................................... [...]
| o- nvme-subsystem-tmp ............................................................................................... [...]
o- subsystems .............................................................................................................. [...]
o- nvme-subsystem-tmp ................................................................................................... [...]
o- allowed_hosts ....................................................................................................... [...]
o- namespaces .......................................................................................................... [...]
o- 1 ................................................................................................................. [...]
Discovery is successful with the following command:
nvme discover -t rdma -a 192.168.20.157 -s 1023
Discovery Log Number of Records 1, Generation counter 1
=====Discovery Log Entry 0======
trtype: rdma
adrfam: ipv4
subtype: nvme subsystem
treq: not specified
portid: 1
trsvcid: 1023
subnqn: nvme-subsystem-tmp
traddr: 192.168.20.157
rdma_prtype: not specified
rdma_qptype: connected
rdma_cms: rdma-cm
rdma_pkey: 0x0000
When running connect as follows, we get the kernel panic
nvme connect -t rdma -n nvme-subsystem-tmp -a 192.168.20.157 -s 1023
Please advise how to proceed.
Thanks,
Michal
[ 663.010545] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.20.157:1023
[ 663.010545] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.20.157:1023
[ 663.052781] nvme nvme0: creating 24 I/O queues.
[ 663.052781] nvme nvme0: creating 24 I/O queues.
[ 663.408093] nvme nvme0: Connect command failed, error wo/DNR bit: -16402
[ 663.408093] nvme nvme0: Connect command failed, error wo/DNR bit: -16402
[ 663.409116] nvme nvme0: failed to connect queue: 3 ret=-18
[ 663.409116] nvme nvme0: failed to connect queue: 3 ret=-18
I'm not NVMeF expert, but -18 error code means EXDEV and not many places
in code can return this error, also it is negative => it is Linux's
error and not NVMe.
So based on 4.16-rc1 code, the flow is:
nvme_rdma_start_queue ->
nvmf_connect_io_queue ->
__nvme_submit_sync_cmd ->
nvme_alloc_request ->
blk_mq_alloc_request_hctx ->
437 /*
438 * Check if the hardware context is actually mapped to anything.
439 * If not tell the caller that it should skip this queue.
440 */
441 alloc_data.hctx = q->queue_hw_ctx[hctx_idx];
442 if (!blk_mq_hw_queue_mapped(alloc_data.hctx)) {
443 blk_queue_exit(q);
444 return ERR_PTR(-EXDEV);
445 }
Hope it helps.
Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html