On 4/22/22 16:04, Bob Pearson wrote: > Local operations in the rdma_rxe driver are not obviously idempotent. But, the > RC retry mechanism backs up the send queue to the point of the wqe that is > currently being acknowledged and re-walks the sq. Each send or write operation is > retried with the exception that the first one is truncated by the packets already > having been acknowledged. Each read and atomic operation is resent except that > read data already received in the first wqe is not requested. But all the > local operations are replayed. The problem is local invalidate which is destructive. > For example > > sq: some operation that times out > bind mw to mr > some other operation > invalidate mw > invalidate mr > > can't be replayed because invalidating the mr makes the second bind fail. > There are lots of other examples where things go wrong. > > To make things worse the send queue timer is never cleared and for typical > timeout values goes off every few msec whether anything actually failed. > > Bob This looks like an unholy mess. The reason I was looking at it is because Lustre on rxe doesn't work at the moment and the problems were traced to retry flows (on a very reliable network) caused by stray timeouts. We see local_invalidate_mr operations getting retried multiple times and not all of them succeed because the caller is remapping the fast MR in the mean time and changing the rkey. Bob