On Thu, Jul 28, 2022 at 12:54:34PM -0500, Bob Pearson wrote: > On 7/28/22 11:42, Jason Gunthorpe wrote: > > On Mon, Jul 25, 2022 at 03:01:15PM -0500, Bob Pearson wrote: > >> Currently the rxe driver does not guard changes to the mr state > >> against race conditions which can arise from races between > >> local operations and remote invalidate operations. This patch > >> adds a spinlock to the mr object and makes the state changes > >> atomic. > > > > This doesn't make it atomic.. > > > >> + state = smp_load_acquire(&mr->state); > >> + > >> if (unlikely((type == RXE_LOOKUP_LOCAL && mr->lkey != key) || > >> (type == RXE_LOOKUP_REMOTE && mr->rkey != key) || > >> mr_pd(mr) != pd || (access && !(access & mr->access)) || > >> - mr->state != RXE_MR_STATE_VALID)) { > >> + state != RXE_MR_STATE_VALID)) { > >> rxe_put(mr); > > > > This is still just differently racy > > > > The whole point of invalidate is to say that when the invalidate > > completion occurs there is absolutely no touching of the memory that > > MR points to. > > > > I don't see how this acheives this like this. You need a proper lock > > spanning from the lookup here until all the "dma" is completed. > > > > Jason > > Interesting. Then things are in a bit of a mess. Before this patch of course there > was nothing. And, rxe_resp.c currently looks up an mr from the rkey and saves it > in the qp and then uses it for additional packets as required for e.g. rdma write > operations. A local invalidate before a multipacket write finishes will have the wrong > effect. It will continue to use the mr to perform the data copies. And the data copy > routine does not validate the mr state. We would have to save the rkey instead and > re-lookup the mr for each packet. > > For a single packet we complete the dma in a single tasklet call. We would have a choice > of holding a spinlock (for a fairly long time) or marking the mr as busy and deferring a > local invalidate. A remote invalidate would fall between the packets of an rdma op. Some kind of "lock" that delays the invalidate completion until the MR is finished being used is also fairly sensible. In this case you'd probably convert the 'valid' into some kind of refcount and when refcount_put reaches 0 then you'd complete any pending invalidation WR. So on the tasklet side it becomes two atomics - which is about the same overhead as a spinlock. Jason