On Sat, Jan 12, 2019 at 8:42 PM Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote: > > On Sat, Jan 12, 2019 at 11:06:51AM -0700, Parav Pandit wrote: > > > > > > > From: Jason Gunthorpe > > > Sent: Saturday, January 12, 2019 11:39 AM > > > To: Parav Pandit <parav@xxxxxxxxxxxx> > > > Cc: Leon Romanovsky <leon@xxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; Moni > > > Shoua <monis@xxxxxxxxxxxx> > > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() launches async > > > work > > > > > > On Sat, Jan 12, 2019 at 10:32:54AM -0700, Parav Pandit wrote: > > > > > > > > > > > > > From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- > > > > > owner@xxxxxxxxxxxxxxx> On Behalf Of Jason Gunthorpe > > > > > Sent: Saturday, January 12, 2019 11:06 AM > > > > > To: Leon Romanovsky <leon@xxxxxxxxxx> > > > > > Cc: linux-rdma@xxxxxxxxxxxxxxx; Moni Shoua <monis@xxxxxxxxxxxx> > > > > > Subject: Re: [PATCH for-rc 2/2] IB/mlx5: Fix how advise_mr() > > > > > launches async work > > > > > > > > > > On Sat, Jan 12, 2019 at 05:06:02PM +0200, Leon Romanovsky wrote: > > > > > > On Fri, Jan 11, 2019 at 07:31:24PM -0700, Jason Gunthorpe wrote: > > > > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > > > > > > > > > > > > > Work must hold a kref on the ib_device otherwise the dev pointer > > > > > > > can become free before the work runs. > > > > > > > > > > > > How can it occur exactly? > > > > > > > > > > > > "dev" will be released after mlx5_ib will exist, mlx5_ib exist > > > > > > will drain advise_mr queue before it. Such drain will ensure that > > > > > > no works are running. > > > > > > > > > > Okay, that is fine for the get_device, it can be removed. > > > > > > > > > > Are we OK with the reg_state hunk? > > > > > > > > > > Jason > > > > > > > > Unrelated to your question, I noticed that advise_mr does > > > > uverbs_attr_ptr_get_array_size() for the SGEs. I think this is > > > > incorrect. It should have been mr handles whose ownership can be > > > > check against the PID or ucontext that registered those MRs. > > > > > > The SGE's have lkeys in them, and the lkeys are local to the PD passed in as > > > argument. > > > > > > So the caller cannot touch resources outside the PD, hopefully the driver > > > implementation checks this.. > > > > > pagefault_single_data_segment () invokes __mlx5_mr_lookup() to > > lookup a MR per device and not per PD. mlx5_ib_advise_mr_prefetch() > > doesn't touch pd. work item doesn't have PD in it. > > Ugh, Moni? > > I don't think we should alow processes to page fault outside their PD, > that sounds like an avenue for a side-channel security problem. > I can pass pd from prefetch_mr down to pagefault_single_data_segment() and force that that lookup will be also for pd. However there is another flow that goes through pagefault_single_data_segment() that starts from hardware page-fault. What will be the pd in this case? Perhaps the pd of the source/destination QP of the packet that generated the pagefault? But I can do this only if the type of page-fault is WQE. In this case the hardware provides the QPN. For page-fault of type RDMA the hardware doesn't provide the QPN, only the lkey, address and size (dev is the one that fired the event). So if there is a potential security risk in the advise_mr flow doesn't it exist in the page-fault flow? A malicious packet can do the same thing a malicious process does. What do you think? > > After that MR got reallocated to other process before work item can > > do the work and now it fault's some other process. > > The locking around mr_srcu and MR lifetime is all broken, so this is > not too surprising.. > > > To avoid this, instead of overloading dev->mr_srcu, this work item > > should be enqueued to advise_mr_wq and that should be flushed during > > destroy_mr() which ensures that nothing dangling for this MR. Such > > code will create right synchronization without depending on > > randomness of key, number of MR etc parameters. > > Moni? > > Jason