On Thu, Apr 28, 2022 at 08:31:24AM -0500, Bob Pearson wrote: > This is a strong constraint on the send queue but is the only sane solution I suspect. > It implies that not attempting to redo local operations implies that the verbs consumer > must guarantee that they can safely change the MR/MW state as soon as the operation is > executed for the first time. This means that either there is a fence or they have seen > the completion of all IO operations that depend on the memory. It is not clear that all > test cases obey these rules or that they don't. We should WARN on those situations where > we can see a violation. The spec defines the fencing requirements for this already, see for instance "9.4.1.1.1 Invalidate Operation Ordering": 3) a SEND with Invalidate operation may impact a previous RDMA READ operation. Thus, a requester should not perform a SEND with Invalidate while previous RDMA READ operations are still out- standing. The requester can set the Fence attribute on a given work request such as a SEND with Invalidate in order to ensure that pre- vious outstanding RDMA READ operations have completed before initiating a subsequent SEND with Invalidate operation. I have no doubt we have subtle ULP bugs here, we've historically had bugs with ULPs doing invalidation wrong - the usual mistake is assuming that the recv completion is sufficient to trigger invalidation - it is not true, the ULP must also see the send completion consuming the rkey before it triggers invalidation. It is not guarenteed that the SQ completion will arrive before the RQ completion, even though it seems like causally that has to be true, lost packets and other abnormal cases cause problems. Jason