Re: [PATCH rdma-next v1 1/2] RDMA/umem: Don't hold mmap_sem for too long

Leon Romanovsky <leon@xxxxxxxxxx> · Wed, 4 Jul 2018 19:25:07 +0300

On Wed, Jul 04, 2018 at 08:53:54AM -0600, Jason Gunthorpe wrote:
> On Wed, Jul 04, 2018 at 10:34:24AM +0300, Leon Romanovsky wrote:
>
> > > This is a bit wrong these days.. Now that this is
> > > get_user_pages_longterm() a NULL vma_list causes it to allocate, so
> > > the code before isn't right:
> > >
> > > 	/*
> > > 	 * if we can't alloc the vma_list, it's not so bad;
> > > 	 * just assume the memory is not hugetlb memory
> > > 	 */
> > > 	vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL);
> > > 	if (!vma_list)
> > >
> > > As this now should be a hard failure.
> > >
> > > Arguably this should probably just split the page_list allocation in
> > > half and use half for vmas and half for pages..
> > >
> > > Not related to this patch, but since you are tidying this area..
> >
> > get_user_pages_longterm() doesn't try to allocate whole page, but exact
> > number of memory upto PAGE_SIZE, which according to code comment above can
> > be less than __get_free_page() returns. However, I failed to find proof
> > for that comment and it seems that __get_free_page() will always return page
> > in the size of PAGE_SIZE.
>
> ?? The comment is just saying failure is ignored because the
> computation it does is optional.

"just assume the memory is not hugetlb memory"

so what does it mean hugetlb in that context?

>
> > I would recommend to do proposed change in different patch and only
> > after i40iw and bnxt_re drivers will be fixed to do not use hugetlb.
>
> That is another way to handle it

The question is when they will update.
For example, we already had similar discussion
https://patchwork.kernel.org/patch/9632497/

>
> > >
> > > > -
> > > > +		up_read(&current->mm->mmap_sem);
> > > >  		if (ret < 0)
> > > >  			goto out;
> > > >
> > > > @@ -189,12 +190,14 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> > > >  		cur_base += ret * PAGE_SIZE;
> > > >  		npages   -= ret;
> > > >
> > > > +		down_read(&current->mm->mmap_sem);
> > > >  		for_each_sg(sg_list_start, sg, ret, i) {
> > >
> > > This seems placed wrong, the up_read was only two lines higher. Surely
> > > the entire 'while(npages)' should be holding the mmap_sem? Or at least
> > > each single iteration.. Maybe the loop needs to be moved into a
> > > function.
> >
> > Not really, the down_read() protects access to is_vm_hugetlb_page(vma_list[i]
> > check. We don't need to protect npages because they can't be changed
> > concurrently, in opposite to VMA and pages.
>
> Grabing and releasing a lock like that with no reason makes very
> little sense.
>
> Further, I don't want to prove the poitners in vma_list are safe to
> use after dropping the mmap_sem (they probably are not, I'm guessing),
> so holding the lock for the entire loop is a better choice.

The pointers are safe, the properties are not.
But no problem, I'll update and resend the patch.

Thanks

>
> Jason
Attachment:
signature.asc

Description: PGP signature