Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async work

Jason Gunthorpe <jgg@xxxxxxxxxxxx> · Sat, 12 Jan 2019 18:41:06 +0000

On Sat, Jan 12, 2019 at 11:06:51AM -0700, Parav Pandit wrote:
> 
> 
> > From: Jason Gunthorpe
> > Sent: Saturday, January 12, 2019 11:39 AM
> > To: Parav Pandit <parav@xxxxxxxxxxxx>
> > Cc: Leon Romanovsky <leon@xxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; Moni
> > Shoua <monis@xxxxxxxxxxxx>
> > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async
> > work
> > 
> > On Sat, Jan 12, 2019 at 10:32:54AM -0700, Parav Pandit wrote:
> > >
> > >
> > > > From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma-
> > > > owner@xxxxxxxxxxxxxxx> On Behalf Of Jason Gunthorpe
> > > > Sent: Saturday, January 12, 2019 11:06 AM
> > > > To: Leon Romanovsky <leon@xxxxxxxxxx>
> > > > Cc: linux-rdma@xxxxxxxxxxxxxxx; Moni Shoua <monis@xxxxxxxxxxxx>
> > > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr()
> > > > launches async work
> > > >
> > > > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote:
> > > > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote:
> > > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
> > > > > >
> > > > > > Work must hold a kref on the ib_device otherwise the dev pointer
> > > > > > can become free before the work runs.
> > > > >
> > > > > How can it occur exactly?
> > > > >
> > > > > "dev" will be released after mlx5_ib will exist, mlx5_ib exist
> > > > > will drain advise_mr queue before it. Such drain will ensure that
> > > > > no works are running.
> > > >
> > > > Okay, that is fine for the get_device, it can be removed.
> > > >
> > > > Are we OK with the reg_state hunk?
> > > >
> > > > Jason
> > >
> > > Unrelated to your question, I noticed that advise_mr does
> > > uverbs_attr_ptr_get_array_size() for the SGEs.  I think this is
> > > incorrect.  It should have been mr handles whose ownership can be
> > > check against the PID or ucontext that registered those MRs.
> > 
> > The SGE's have lkeys in them, and the lkeys are local to the PD passed in as
> > argument.
> > 
> > So the caller cannot touch resources outside the PD, hopefully the driver
> > implementation checks this..
> >
> pagefault_single_data_segment () invokes __mlx5_mr_lookup() to
> lookup a MR per device and not per PD.  mlx5_ib_advise_mr_prefetch()
> doesn't touch pd.  work item doesn't have PD in it.

Ugh, Moni?

I don't think we should alow processes to page fault outside their PD,
that sounds like an avenue for a side-channel security problem.

> After that MR got reallocated to other process before work item can
> do the work and now it fault's some other process.

The locking around mr_srcu and MR lifetime is all broken, so this is
not too surprising..

> To avoid this, instead of overloading dev->mr_srcu, this work item
> should be enqueued to advise_mr_wq and that should be flushed during
> destroy_mr() which ensures that nothing dangling for this MR.  Such
> code will create right synchronization without depending on
> randomness of key, number of MR etc parameters.

Moni?

Jason