On 1/7/2023 6:13 AM, Haeuptle, Michael wrote:
External email: Use caution opening links or attachments Hello, I'm running into an issue where rdma_create_qp_ex returns EINVAL and I was hoping that someone could help me understand what is going on here. The function that is actually throwing the EINVAL error is the write() call in rdma_init_qp_attr (which is being called by rdma_create_qp_ex): ... ret = write(id->channel->fd, &cmd, sizeof cmd); ... It returns -1 and sets errno to 22. Note, this is an intermittent error and not always reproducible. The setup and scenario is as follows: - SPDK NVMF target on Debian 11.3 with top of tree rdma-core libs - NVMe-oF kernel initiator, Debain 11.5 (no change in rdma-core libs) - There is a switch between initiator and SPDK NVMF targets - The kernel initiator is taking to 2 SPDK NVMF targets via DM and round-robin (I don't think this matters) - On the initiator system there is a 512k block size fio load against 48 NMF subsystems (2 target apps with 24 subsystems) - When I kill the SPDK target and restart it, then I occasionally get this EINVAL on one of the queue pairs It's unclear to me why the write call is retuning EINVAL. The file descriptor should be valid since I see the same fd in later qpair creation requests. Any insights are appreciated. -- Michael
Maybe the cm is in a state that cannot do init_qp_attr? Do we know what is QP state and cm state (need to do sniffer to check what is the last received/sent CM packet). The file descriptor should be irrelevant.
If able to debug kernel maybe debug this function: drivers/infiniband/core/cma.c::rdma_init_qp_attr() to see where this EINVAL is returned and why.