On Tue, Oct 17, 2023 at 12:09:31PM -0500, Bob Pearson wrote: > For qp#167 the call to srp_post_send() is followed by the rxe driver > processing the send operation and generating a work completion which > is posted to the send cq but there is never a following call to > __srp_get_rx_iu() so the cqe is not received by srp and failure. ? I don't see this funcion in the kernel? __srp_get_tx_iu ? > I don't yet understand the logic of the srp driver to fix this but > the problem is not in the rxe driver as far as I can tell. It looks to me like __srp_get_tx_iu() is following the design pattern where the send queue is only polled when it needs to allocate a new send buffer - ie the send buffers are pre-allocated and cycle through the queue. So, it is not surprising this isn't being called if it is hung - the hang is probably something that is preventing it from even wanting to send, which is probably a receive side issue. Followup back up from that point to isolate what is the missing resouce to trigger send may bring some more clarity. Alternatively if __srp_get_tx_iu() is failing then perhaps you've run into an issue where it hit something rare and recovery does not work. eg this kind of design pattern carries a subtle assumption that the rx and send CQ are ordered together. Getting a rx CQ before a matching tx CQ can trigger the unusual scenario where the send side runs out of resources. Jason