Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async work

Moni Shoua <monis@xxxxxxxxxxxx> · Mon, 14 Jan 2019 15:09:17 +0200



On Sun, Jan 13, 2019 at 8:18 PM Parav Pandit <parav@xxxxxxxxxxxx> wrote:
>
>
>
> > -----Original Message-----
> > From: Moni Shoua <monis@xxxxxxxxxxxx>
> > Sent: Sunday, January 13, 2019 10:57 AM
> > To: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
> > Cc: Parav Pandit <parav@xxxxxxxxxxxx>; Leon Romanovsky
> > <leon@xxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> > work
> >
> > > Ah! I *thought* I checked this, yes, using the system work queue is
> > > why I added the put_device :)
> > >
> > > > It should be done to advise_mr_wq().  This will give chance to flush
> > > > the wq when IB device is unregistered by the core.
> > >
> > > Good question - Moni??
> > >
> > > Jason
> > Thanks.
> > Using schedule_work() is obviously a bug. The intention was to use the
> > advise_mr_wq. I'll send a fix for that.
> > However, After this will be fixed there is no need to get_device() since it is
> > assured that no work items will be processed after
> > mlx5_ib_stage_init_cleanup() finishes.
>
> Also you either need to flush the advise_mr_wq or cancel the pending work item when MR is destroyed to make sure that whatever page fault user triggered are canceled. Otherwise a MR can get recycled (dealloc, alloc) (looked up by MR key) for same PD which is page faulting now, which user didn't ask to.