在 2022/4/23 5:04, Bob Pearson 写道:
Local operations in the rdma_rxe driver are not obviously idempotent. But, the RC retry mechanism backs up the send queue to the point of the wqe that is currently being acknowledged and re-walks the sq. Each send or write operation is retried with the exception that the first one is truncated by the packets already having been acknowledged. Each read and atomic operation is resent except that read data already received in the first wqe is not requested. But all the local operations are replayed. The problem is local invalidate which is destructive. For example
Is there any example or just your analysis? You know, sometimes your analysis is not always correct. To prove your analysis, please show us some solid example. Zhu Yanjun
sq: some operation that times out bind mw to mr some other operation invalidate mw invalidate mr can't be replayed because invalidating the mr makes the second bind fail. There are lots of other examples where things go wrong. To make things worse the send queue timer is never cleared and for typical timeout values goes off every few msec whether anything actually failed. Bob