> These errors are either from: > 1. mapping error on the host side - not sure given we don't see any error > completions/events from the rdma device. However, can you turn on dynamic > debug to see QP events? > > echo "func nvme_rdma_qp_event +p" > > /sys/kernel/debug/dynamic_debug/control Yes, I can try this out. Will this just print to dmesg or do I need to collect a log from somewhere? > The fact that null_blk didn't reproduce this was probably because it is less > bursty (which can cause network congestion). See email I just now sent in reply to Max (this same thread). I believe we reproduced the same issue with null_blk last night after correctly configuring some latency into the null_blk devices. > Joseph, are you sure that flow control is correctly configured and working > reliably? I believe it is set up correctly. Running ethtool against the NIC interfaces in use reports: Supported pause frame use: Symmetric Receive-only And all ports in use on the Arista 7060X switch report it turned on in both directions: flowcontrol send on flowcontrol receive on If there's anywhere else we can check, or any direct test of flow control we can run, happy to try it. Should we be OK with only Rx flow control at the NIC (this seems to be the default behavior) or is it recommended to set up Tx flow control as well? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html