help debugging/understanding the use of rdma-core's rdma_get_send_comp()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

I'm looking to see if somebody can help explain why
rdma_get_send_comp() would fail with EAGAIN (after what looks like a
successful send) and how to either not get into that state or recover
from that state?

Just like in librdmacma/example/rdma_client.c the code does
creates ep.
registers memory for send/receive
rdma_post_recv()
rdma_post_send()
rdma_get_send_comp()
rdma_get_recv_comp()
On the network trace the send generates correct bytes and the server
acknowledges that with a good ACK. Yet, rdma_get_send_comp() seems to
say there is a problem? Furthermore, once in that state, doing a
disconnect, destroying ep, doesn't seem to get the client unstuck from
that state and on the next creation of ep, connect, 1st send generates
a send_completion problem again. This state isn't hit every time from
the start. It takes a while for this to trigger and the code works
normally in sending, receiving rdma messages. But once it gets into
that state, only a reboot seems to get the code working again until
this state is reached.

I'm aware that alternatively, we can create our own event_handler and
do the ibv_poll_cq() but I'm curious if there is something inherently
problematic in doing rdma_post_send(), rdma_get_send_comp() that leads
to issues?

In general, it's unclear to me how a send completion can fail after it
looks like that the message was successfully sent.

Thank you.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux