Re: [PATCH rdma-next v1 1/2] RDMA/umem: Don't hold mmap_sem for too long

Jason Gunthorpe <jgg@xxxxxxxx> · Wed, 4 Jul 2018 08:53:54 -0600

On Wed, Jul 04, 2018 at 10:34:24AM +0300, Leon Romanovsky wrote:

> > This is a bit wrong these days.. Now that this is
> > get_user_pages_longterm() a NULL vma_list causes it to allocate, so
> > the code before isn't right:
> >
> > 	/*
> > 	 * if we can't alloc the vma_list, it's not so bad;
> > 	 * just assume the memory is not hugetlb memory
> > 	 */
> > 	vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL);
> > 	if (!vma_list)
> >
> > As this now should be a hard failure.
> >
> > Arguably this should probably just split the page_list allocation in
> > half and use half for vmas and half for pages..
> >
> > Not related to this patch, but since you are tidying this area..
> 
> get_user_pages_longterm() doesn't try to allocate whole page, but exact
> number of memory upto PAGE_SIZE, which according to code comment above can
> be less than __get_free_page() returns. However, I failed to find proof
> for that comment and it seems that __get_free_page() will always return page
> in the size of PAGE_SIZE.

?? The comment is just saying failure is ignored because the
computation it does is optional.

> I would recommend to do proposed change in different patch and only
> after i40iw and bnxt_re drivers will be fixed to do not use hugetlb.

That is another way to handle it

> >
> > > -
> > > +		up_read(&current->mm->mmap_sem);
> > >  		if (ret < 0)
> > >  			goto out;
> > >
> > > @@ -189,12 +190,14 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> > >  		cur_base += ret * PAGE_SIZE;
> > >  		npages   -= ret;
> > >
> > > +		down_read(&current->mm->mmap_sem);
> > >  		for_each_sg(sg_list_start, sg, ret, i) {
> >
> > This seems placed wrong, the up_read was only two lines higher. Surely
> > the entire 'while(npages)' should be holding the mmap_sem? Or at least
> > each single iteration.. Maybe the loop needs to be moved into a
> > function.
> 
> Not really, the down_read() protects access to is_vm_hugetlb_page(vma_list[i]
> check. We don't need to protect npages because they can't be changed
> concurrently, in opposite to VMA and pages.

Grabing and releasing a lock like that with no reason makes very
little sense.

Further, I don't want to prove the poitners in vma_list are safe to
use after dropping the mmap_sem (they probably are not, I'm guessing),
so holding the lock for the entire loop is a better choice.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html