On 9/2/21 6:38 PM, Jason Gunthorpe wrote: > On Thu, Sep 02, 2021 at 04:41:15PM -0500, Bob Pearson wrote: >> Now that for-next is on 5.14.0-rc6+ blktest srp/002 is very close to >> working for rxe but there is still one error. After adding MW >> support I added a test to local invalidate to check and see if the >> l/rkey matched the key actually contained in the MR/MW when local >> invalidate is called. This is failing for srp/002 with the key >> portion of the rkey off by one. Looking at ib_srp.c I see code that >> does in fact increment the rkey by one and also has code that posts >> a local invalidate. This was never checked before and is now failing >> to match. If I mask off the key portion in the test the whole test >> case passes so the other problems appear to have been fixed. If the >> increment and invalidate are out of sync this could result in the >> error. I suspect this may be a bug in srp. Worst case I can remove >> this test but I would rather not. > > I didn't check the spec, but since SRP works with HW devices I wonder > if invalidation is supposed to ignore the variant bits in the mkey? > > Jason > I am a little worried. srp is pretty complex but roughly it looks like it maintains a pool of MRs which it recycles. Each time it reuses the MR it increments the key portion of the rkey. Before that it uses local invalidate WRs to invalidate the MRs presumably to prevent stray accesses to the old version of the MR from e.g. replicated packets. It posts these WRs to a send queue but I don't see where it closes the loop by waiting for a WC so there may be a race between the invalidate and the subsequent map_sg call. The invalidate marks the MR as not usable so this must all happen before the MR is turned on again. Bob