update the subject to better describe the issue: So I tried this issue on one nvme/rdma environment, and it was also reproducible, here are the steps: # echo 0 >/sys/devices/system/cpu/cpu0/online # dmesg | tail -10 [ 781.577235] smpboot: CPU 0 is now offline # nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn Failed to write to /dev/nvme-fabrics: Invalid cross-device link no controller found: failed to write to nvme-fabrics device # dmesg [ 781.577235] smpboot: CPU 0 is now offline [ 799.471627] nvme nvme0: creating 39 I/O queues. [ 801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues. [ 801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 [ 801.073059] nvme nvme0: failed to connect queue: 1 ret=-18
This is because of blk_mq_alloc_request_hctx() and was raised before. IIRC there was reluctance to make it allocate a request for an hctx even if its associated mapped cpu is offline. The latest attempt was from Ming: [PATCH V7 0/3] blk-mq: fix blk_mq_alloc_request_hctx Don't know where that went tho...