On Fri, Sep 03, 2021 at 04:13:22PM -0700, Bart Van Assche wrote: > On 9/3/21 3:18 PM, Bob Pearson wrote: > > On 9/2/21 6:38 PM, Jason Gunthorpe wrote: > > > On Thu, Sep 02, 2021 at 04:41:15PM -0500, Bob Pearson wrote: > > > > Now that for-next is on 5.14.0-rc6+ blktest srp/002 is very close to > > > > working for rxe but there is still one error. After adding MW > > > > support I added a test to local invalidate to check and see if the > > > > l/rkey matched the key actually contained in the MR/MW when local > > > > invalidate is called. This is failing for srp/002 with the key > > > > portion of the rkey off by one. Looking at ib_srp.c I see code that > > > > does in fact increment the rkey by one and also has code that posts > > > > a local invalidate. This was never checked before and is now failing > > > > to match. If I mask off the key portion in the test the whole test > > > > case passes so the other problems appear to have been fixed. If the > > > > increment and invalidate are out of sync this could result in the > > > > error. I suspect this may be a bug in srp. Worst case I can remove > > > > this test but I would rather not. > > > > > > I didn't check the spec, but since SRP works with HW devices I wonder > > > if invalidation is supposed to ignore the variant bits in the mkey? > > > > I am a little worried. srp is pretty complex but roughly it looks like it maintains a pool of > > MRs which it recycles. Each time it reuses the MR it increments the key portion of the rkey. Before > > that it uses local invalidate WRs to invalidate the MRs presumably to prevent stray accesses > > to the old version of the MR from e.g. replicated packets. It posts these WRs to a send queue but I > > don't see where it closes the loop by waiting for a WC so there may be a race between the invalidate > > and the subsequent map_sg call. The invalidate marks the MR as not usable so this must all happen > > before the MR is turned on again. > > Hi Bob, > > If there would be any code in the SRP driver that is not compliant with the > IBTA specification then I can fix it. > > Regarding the invalidate work requests submitted by the ib_srp driver: these > are submitted before srp_fr_pool_put() is called. A new registration request > is submitted after srp_fr_pool_get() succeeds. There is one MR pool per RDMA > channel and there is one QP per RDMA channel. In other words, > (re)registration requests are submitted to the same QP as unregistration > requests after local invalidate requests. I think the IBTA requires does not > allow to reorder a local invalidate followed by a fast registration request. Right Jason