> Hello everyone, > > I'm currently having a performance issue to synchronize two different nodes > with a simple ping/pong algorithm. > I currently have two different simple code to resume my issue : > > The first one work as intended, and loop as follow on both client and server > sides : > - post a send work request > - post a receive work request > - wait both completion, acknowledge them and continue. > This little piece of program work as intended, and I'm able to complete 100k > request in 2–3 seconds. > > However, the second code is as follows : > The client is identical as the first code. > The server do : > - post a receive work request > - wait its completion and acknowledge it > - post a send work request > - wait its completion and acknowledge it When I do this, it happens that the > time to complete a request can take up to 2 seconds (most of it inside the > "ibv_get_cq_event()") Furthermore, we observed that, this happens more > often when multiple threads try to do this in synch (unlike first code). Hey, The problem you experience is that the responder does not have receive buffers when incoming send packet arrives. That is why you do not see it for Writes and Reads. To check that you can configure your RC QP with zero RNR NAK retransmit. (rnr_retry = 0). You may wonder why you experience it, and the key is that Send WC is generated after an RTT (as WC indicates reliable reception of data) and Receive WC once data arrives (as Receive WR is consumed). You have the following problem with a new client: The new client waits for a Send WC before posting a new Receive WR, but the server sends a WR once it sees a Receive WR. As a result, the client has not posted on time a receive WR (as it saw a Send WC right before an incoming Send message), which incurs RNR NAK retransmit. - Konstantin