Re: [PATCH rdma-next] RDMA/i40w: Hold read semaphore while looking after VMA

Jason Gunthorpe <jgg@xxxxxxxxxxxx> · Thu, 26 Jul 2018 10:46:09 -0600

On Thu, Jul 26, 2018 at 11:34:42AM -0500, Shiraz Saleem wrote:
> On Mon, Jul 23, 2018 at 04:29:04PM -0600, Jason Gunthorpe wrote:
> > On Mon, Jul 23, 2018 at 05:19:41PM -0500, Shiraz Saleem wrote:
> > > On Mon, Jul 23, 2018 at 02:39:56PM -0600, Jason Gunthorpe wrote:
> > > > On Sat, Jul 21, 2018 at 08:31:30AM -0500, Shiraz Saleem wrote:
> > > > > On Sun, Jul 01, 2018 at 10:36:24AM -0600, Leon Romanovsky wrote:
> > > > > > From: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > > > > > 
> > > > > > VMA lookup is supposed to be performed while mmap_sem is held.
> > > > > > 
> > > > > > Fixes: f26c7c83395b ("i40iw: Add 2MB page support")
> > > > > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > > > > >
> > > > > Thanks for fixing this bug.
> > > > > 
> > > > > Acked-by: Shiraz Saleem <shiraz.saleem@xxxxxxxxx> 
> > > > 
> > > > This code in the driver all looks wrong to me, it should not rely on
> > > > the huge pages flag and it shouldn't do this VM walking.
> > > >
> > > Hi Jason - Why is that we cant use umem->hugetlb to determine if
> > > MR is composed of huge pages? Isnt that why the flag was defined for
> > > in the first place. All the VMAs w.r.t the user-space buffer are
> > > scanned to make this determination in ib_umem_get.
> > > Isnt it better to use it than implement it in individual driver.
> > 
> > It was a mistake to have this flag and all drivers except i40iw and
> > bnxt don't use it. I wish to remove it.
> I see. What was the drawback to the approach?

It is just wongly designed.. page merging is about combining physical
addresses within a HW supported size list, not about matching the CPU
huge page size(s) to HW.

We should never even care about 'huge pages' in the VM sense, and
certainly never scan a VM, and it is buggy/racy to do so, I
think. Leon already noticed missing locking, but I think there is
worse here since we have desync'd with the lock that get_user_pages
was run under.

> > Drivers are supposed to upgrade the page order by looking for runs of
> > suitable physical addresses after DMA translation, not by inspecting
> > the VMA. Something like mlx5_ib_cont_pages() and related
> > 
> > The core could provde support code for this common algorithm..
>
> It is a great idea for core to replace the flag with the common algorithm
> and avoid duplicating in every driver. I’m not in favor of changing the driver
> just to get rid of the flag and then again to use the common algorithm in core.

Well, someone will need to figure out what that core bit should look
like, and turns out this driver is buggy in this area, so.... :)

I think this would probably be something like passing in a bitmap (?)
of supported page sizes to ib_umem_get() and then ib_umem_get would
combine pages when it builds up the s/g list.

So each driver driver gets a minimum set of s/g entries within its
set of supported page sizes, and doesn't have to do any more work.

The algorithm is a bit tricky though, I think?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html