Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone

Jason Gunthorpe <jgg@xxxxxxxxxx> · Sun, 25 Apr 2021 10:08:57 -0300

On Sun, Apr 25, 2021 at 04:03:47PM +0300, Leon Romanovsky wrote:
> On Thu, Apr 22, 2021 at 11:29:39AM -0300, Jason Gunthorpe wrote:
> > On Wed, Apr 21, 2021 at 08:03:22AM +0300, Leon Romanovsky wrote:
> > 
> > > I didn't understand when reviewed either, but decided to post it anyway
> > > to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP
> > > check.
> > 
> > I think the whole thing should look more like this and we delete the
> > if entirely.
> 
> I have mixed feelings about this approach. Before "destroy can fail disaster",
> the restrack goal was to provide the following flow:
> 1. create new memory object - rdma_restrack_new()
> 2. create new HW object - .create_XXX() callback in the driver
> 3. add HW object to the DB - rdma_restrack_del()
> ....
> 4. wait for any work on this HW object to complete - rdma_restrack_del()
> 5. safely destroy HW object - .destroy_XXX()
> 
> I really would like to stay with this flow and block any access to the
> object that failed to destruct - maybe add to some zombie list.

That isn't the semantic we now have for destroy.

> The proposed prepare/abort/finish flow is much harder to implement correctly.
> Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(),
> but didn't restore them after .destroy_qp() failure.

I think it is a bug we call rdma_rw code in a a user path.

Jason