On Sat, Jan 12, 2019 at 11:06:51AM -0700, Parav Pandit wrote: > > > > From: Jason Gunthorpe > > Sent: Saturday, January 12, 2019 11:39 AM > > To: Parav Pandit <parav@xxxxxxxxxxxx> > > Cc: Leon Romanovsky <leon@xxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; Moni > > Shoua <monis@xxxxxxxxxxxx> > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async > > work > > > > On Sat, Jan 12, 2019 at 10:32:54AM -0700, Parav Pandit wrote: > > > > > > > > > > From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- > > > > owner@xxxxxxxxxxxxxxx> On Behalf Of Jason Gunthorpe > > > > Sent: Saturday, January 12, 2019 11:06 AM > > > > To: Leon Romanovsky <leon@xxxxxxxxxx> > > > > Cc: linux-rdma@xxxxxxxxxxxxxxx; Moni Shoua <monis@xxxxxxxxxxxx> > > > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() > > > > launches async work > > > > > > > > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote: > > > > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote: > > > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > > > > > > > > > > > Work must hold a kref on the ib_device otherwise the dev pointer > > > > > > can become free before the work runs. > > > > > > > > > > How can it occur exactly? > > > > > > > > > > "dev" will be released after mlx5_ib will exist, mlx5_ib exist > > > > > will drain advise_mr queue before it. Such drain will ensure that > > > > > no works are running. > > > > > > > > Okay, that is fine for the get_device, it can be removed. > > > > > > > > Are we OK with the reg_state hunk? > > > > > > > > Jason > > > > > > Unrelated to your question, I noticed that advise_mr does > > > uverbs_attr_ptr_get_array_size() for the SGEs. I think this is > > > incorrect. It should have been mr handles whose ownership can be > > > check against the PID or ucontext that registered those MRs. > > > > The SGE's have lkeys in them, and the lkeys are local to the PD passed in as > > argument. > > > > So the caller cannot touch resources outside the PD, hopefully the driver > > implementation checks this.. > > > pagefault_single_data_segment () invokes __mlx5_mr_lookup() to > lookup a MR per device and not per PD. mlx5_ib_advise_mr_prefetch() > doesn't touch pd. work item doesn't have PD in it. Ugh, Moni? I don't think we should alow processes to page fault outside their PD, that sounds like an avenue for a side-channel security problem. > After that MR got reallocated to other process before work item can > do the work and now it fault's some other process. The locking around mr_srcu and MR lifetime is all broken, so this is not too surprising.. > To avoid this, instead of overloading dev->mr_srcu, this work item > should be enqueued to advise_mr_wq and that should be flushed during > destroy_mr() which ensures that nothing dangling for this MR. Such > code will create right synchronization without depending on > randomness of key, number of MR etc parameters. Moni? Jason