Re: bug report for rdma_rxe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 28, 2022 at 08:31:24AM -0500, Bob Pearson wrote:

> This is a strong constraint on the send queue but is the only sane solution I suspect.
> It implies that not attempting to redo local operations implies that the verbs consumer
> must guarantee that they can safely change the MR/MW state as soon as the operation is
> executed for the first time. This means that either there is a fence or they have seen
> the completion of all IO operations that depend on the memory. It is not clear that all
> test cases obey these rules or that they don't. We should WARN on those situations where
> we can see a violation.

The spec defines the fencing requirements for this already, see for
instance "9.4.1.1.1 Invalidate Operation Ordering":

 3) a SEND with Invalidate operation may impact a previous RDMA
 READ operation. Thus, a requester should not perform a SEND with
 Invalidate while previous RDMA READ operations are still out-
 standing. The requester can set the Fence attribute on a given work
 request such as a SEND with Invalidate in order to ensure that pre-
 vious outstanding RDMA READ operations have completed before
 initiating a subsequent SEND with Invalidate operation.
 
I have no doubt we have subtle ULP bugs here, we've historically had
bugs with ULPs doing invalidation wrong - the usual mistake is
assuming that the recv completion is sufficient to trigger
invalidation - it is not true, the ULP must also see the send
completion consuming the rkey before it triggers invalidation.

It is not guarenteed that the SQ completion will arrive before the RQ
completion, even though it seems like causally that has to be true,
lost packets and other abnormal cases cause problems.

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux