On Thu, Nov 10, 2022 at 03:29:41PM -0400, Jason Gunthorpe wrote: > On Thu, Nov 10, 2022 at 11:35:37AM +0200, Leon Romanovsky wrote: > > On Wed, Nov 09, 2022 at 03:08:33PM -0400, Jason Gunthorpe wrote: > > > On Mon, Nov 07, 2022 at 10:51:34AM +0200, Leon Romanovsky wrote: > > > > From: Mark Zhang <markzhang@xxxxxxxxxx> > > > > > > > > The MR restrack also needs to be released when delete it, otherwise it > > > > cause memory leak as the task struct won't be released. > > > > > > > > Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects") > > > > Signed-off-by: Mark Zhang <markzhang@xxxxxxxxxx> > > > > Reviewed-by: Michael Guralnik <michaelgur@xxxxxxxxxx> > > > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx> > > > > --- > > > > drivers/infiniband/core/restrack.c | 2 -- > > > > 1 file changed, 2 deletions(-) > > > > > > > > diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c > > > > index 1f935d9f6178..01a499a8b88d 100644 > > > > --- a/drivers/infiniband/core/restrack.c > > > > +++ b/drivers/infiniband/core/restrack.c > > > > @@ -343,8 +343,6 @@ void rdma_restrack_del(struct rdma_restrack_entry *res) > > > > rt = &dev->res[res->type]; > > > > > > > > old = xa_erase(&rt->xa, res->id); > > > > - if (res->type == RDMA_RESTRACK_MR) > > > > - return; > > > > > > This needs more explanation, there was some good reason we needed to > > > avoid the wait_for_completion() for the driver allocated objects, but I > > > can't remember it anymore. > > > > > > You added this code in the v2 of the original series, maybe it had > > > something to do with mlx4? > > > > I failed to remember either, but if you want even more magic in your life, > > see this hilarious thread: > > https://lore.kernel.org/linux-rdma/9ba5a611ceac86774d3d0fda12704cecc30606f9.1618753038.git.leonro@xxxxxxxxxx/ > > Oh, that clears it up > > The issue is that dereg can fail for MR: > > rdma_restrack_del(&mr->res); > ret = mr->device->ops.dereg_mr(mr, udata); > if (!ret) { > atomic_dec(&pd->usecnt); > > Because the driver management of the object puts it in the wrong > order. > > The above if is necessary because if we trigger this failure path > without it, then the next attempt to free the MR will trigger a > WARN_ON. Not really, after first entry to rdma_restrack_del(), we will set res->valid to false. Any subsequent calls to rdma_restrack_del() will do nothing. 322 void rdma_restrack_del(struct rdma_restrack_entry *res) 323 { 324 struct rdma_restrack_entry *old; 325 struct rdma_restrack_root *rt; 326 struct ib_device *dev; 327 328 if (!res->valid) { 329 if (res->task) { 330 put_task_struct(res->task); 331 res->task = NULL; 332 } 333 return; <------- exit 334 } Thanks