The exact kernel version is 4.15.0-23.25 as in https://packages.ubuntu.com/bionic/linux-image-4.15.0-23-generic I tried again with the latest Debian unstable (kernel version 4.16.5) and the symptoms remain the same.
Looking at linux-stable, looks like the suspected missing commit [1] entered in tag v4.16.17, so I'm not sure it exists in your tested kernel, Not sure where I can access the source to verify. [1]
Possible suspect might be: -- commit 2da36d44a9d54a2c6e1f8da1f7ccc26b0bc6cfec Author: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx> Date: Thu Apr 26 11:52:39 2018 +0800 IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV w/o RXE_START_MASK, the last_psn of IB_OPCODE_RC_SEND_ONLY_INV will not be updated in update_wqe_psn, and the corresponding wqe will not be acked in rxe_completer due to its last_psn is zero. Finally, the other wqe will also not be able to be acked, because the wqe of IB_OPCODE_RC_SEND_ONLY_INV with last_psn 0 is still there. This causes large amount of io timeout when nvmeof is over rxe. Add RXE_START_MASK for IB_OPCODE_RC_SEND_ONLY_INV to fix this. Signed-off-by: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx> Reviewed-by: Zhu Yanjun <yanjun.zhu@xxxxxxxxxx> Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx> --
-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html