Re: [PATCH rdma-next 2/4] RDMA/restrack: Release MR restrack when delete

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 10, 2022 at 11:35:37AM +0200, Leon Romanovsky wrote:
> On Wed, Nov 09, 2022 at 03:08:33PM -0400, Jason Gunthorpe wrote:
> > On Mon, Nov 07, 2022 at 10:51:34AM +0200, Leon Romanovsky wrote:
> > > From: Mark Zhang <markzhang@xxxxxxxxxx>
> > > 
> > > The MR restrack also needs to be released when delete it, otherwise it
> > > cause memory leak as the task struct won't be released.
> > > 
> > > Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects")
> > > Signed-off-by: Mark Zhang <markzhang@xxxxxxxxxx>
> > > Reviewed-by: Michael Guralnik <michaelgur@xxxxxxxxxx>
> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx>
> > > ---
> > >  drivers/infiniband/core/restrack.c | 2 --
> > >  1 file changed, 2 deletions(-)
> > > 
> > > diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
> > > index 1f935d9f6178..01a499a8b88d 100644
> > > --- a/drivers/infiniband/core/restrack.c
> > > +++ b/drivers/infiniband/core/restrack.c
> > > @@ -343,8 +343,6 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
> > >  	rt = &dev->res[res->type];
> > >  
> > >  	old = xa_erase(&rt->xa, res->id);
> > > -	if (res->type == RDMA_RESTRACK_MR)
> > > -		return;
> > 
> > This needs more explanation, there was some good reason we needed to
> > avoid the wait_for_completion() for the driver allocated objects, but I
> > can't remember it anymore.
> > 
> > You added this code in the v2 of the original series, maybe it had
> > something to do with mlx4?
> 
> I failed to remember either, but if you want even more magic in your life,
> see this hilarious thread:
> https://lore.kernel.org/linux-rdma/9ba5a611ceac86774d3d0fda12704cecc30606f9.1618753038.git.leonro@xxxxxxxxxx/

Oh, that clears it up

The issue is that dereg can fail for MR:

	rdma_restrack_del(&mr->res);
	ret = mr->device->ops.dereg_mr(mr, udata);
	if (!ret) {
		atomic_dec(&pd->usecnt);

Because the driver management of the object puts it in the wrong
order.

The above if is necessary because if we trigger this failure path
without it, then the next attempt to free the MR will trigger a
WARN_ON.

So, no, this can't just be deleted, and the suggestions I gave in that
prior thread still look like they hold up. As would converting the mr
into core ownership. We are very close now, the mlx5 cache mess is
almost cleaned up enough to do it. Perhaps after Michael's series

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux