Re: [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side

Sagi Grimberg <sagi@xxxxxxxxxxx> · Wed, 6 Jul 2022 18:30:43 +0300

update the subject to better describe the issue:

So I tried this issue on one nvme/rdma environment, and it was also
reproducible, here are the steps:

# echo 0 >/sys/devices/system/cpu/cpu0/online
# dmesg | tail -10
[  781.577235] smpboot: CPU 0 is now offline
# nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn
Failed to write to /dev/nvme-fabrics: Invalid cross-device link
no controller found: failed to write to nvme-fabrics device

# dmesg
[  781.577235] smpboot: CPU 0 is now offline
[  799.471627] nvme nvme0: creating 39 I/O queues.
[  801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues.
[  801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402
[  801.073059] nvme nvme0: failed to connect queue: 1 ret=-18

This is because of blk_mq_alloc_request_hctx() and was raised before.

IIRC there was reluctance to make it allocate a request for an hctx even
if its associated mapped cpu is offline.

The latest attempt was from Ming:
[PATCH V7 0/3] blk-mq: fix blk_mq_alloc_request_hctx

Don't know where that went tho...

The attempt relies on that the queue for connecting io queue uses
non-admined irq, unfortunately that can't be true for all drivers,
so that way can't go.

The only consumer is nvme-fabrics, so others don't matter.
Maybe we need a different interface that allows this relaxation.

So far, I'd suggest to fix nvme_*_connect_io_queues() to ignore failed
io queue, then the nvme host still can be setup with less io queues.

What happens when the CPU comes back? Not sure we can simply ignore it.

Otherwise nvme_*_connect_io_queues() could fail easily, especially for
1:1 mapping.