Re: [PATCH rdma-next v1 1/2] RDMA/umem: Don't hold mmap_sem for too long

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 03, 2018 at 04:14:58PM -0600, Jason Gunthorpe wrote:
> On Sun, Jul 01, 2018 at 03:48:08PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > 
> > DMA mapping is time consuming operation and doesn't need to be
> > performed with mmap_sem semaphore is held on. Limit the scope of
> > that semaphore for accounting part only.
> > 
> > Signed-off-by: Huy Nguyen <huyn@xxxxxxxxxxxx>
> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > ---
> >  drivers/infiniband/core/umem.c | 22 +++++++++++++---------
> >  1 file changed, 13 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> > index 498f59bb4989..5c7ba60848d5 100644
> > --- a/drivers/infiniband/core/umem.c
> > +++ b/drivers/infiniband/core/umem.c
> > @@ -84,7 +84,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> >  	struct ib_umem *umem;
> >  	struct page **page_list;
> >  	struct vm_area_struct **vma_list;
> > -	unsigned long locked;
> >  	unsigned long lock_limit;
> >  	unsigned long cur_base;
> >  	unsigned long npages;
> > @@ -149,15 +148,16 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> >  
> >  	npages = ib_umem_num_pages(umem);
> >  
> > -	down_write(&current->mm->mmap_sem);
> > -
> > -	locked     = npages + current->mm->pinned_vm;
> >  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> >  
> > -	if ((locked > lock_limit) && !capable(CAP_IPC_LOCK)) {
> > +	down_write(&current->mm->mmap_sem);
> > +	current->mm->pinned_vm += npages;
> > +	if ((current->mm->pinned_vm > lock_limit) && !capable(CAP_IPC_LOCK)) {
> > +		up_write(&current->mm->mmap_sem);
> >  		ret = -ENOMEM;
> >  		goto out;
> >  	}
> > +	up_write(&current->mm->mmap_sem);
> >  
> >  	cur_base = addr & PAGE_MASK;
> >  
> > @@ -177,11 +177,12 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> >  	sg_list_start = umem->sg_head.sgl;
> >  
> >  	while (npages) {
> > +		down_read(&current->mm->mmap_sem);
> >  		ret = get_user_pages_longterm(cur_base,
> >  				     min_t(unsigned long, npages,
> >  					   PAGE_SIZE / sizeof (struct page *)),
> >  				     gup_flags, page_list, vma_list);
> 
> This is a bit wrong these days.. Now that this is
> get_user_pages_longterm() a NULL vma_list causes it to allocate, so
> the code before isn't right:
> 
> 	/*
> 	 * if we can't alloc the vma_list, it's not so bad;
> 	 * just assume the memory is not hugetlb memory
> 	 */
> 	vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL);
> 	if (!vma_list)
> 
> As this now should be a hard failure.
> 
> Arguably this should probably just split the page_list allocation in
> half and use half for vmas and half for pages..
> 
> Not related to this patch, but since you are tidying this area..
> 
> > -
> > +		up_read(&current->mm->mmap_sem);
> >  		if (ret < 0)
> >  			goto out;
> >  
> > @@ -189,12 +190,14 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> >  		cur_base += ret * PAGE_SIZE;
> >  		npages   -= ret;
> >
> > +		down_read(&current->mm->mmap_sem);
> >  		for_each_sg(sg_list_start, sg, ret, i) {
> 
> This seems placed wrong, the up_read was only two lines higher. Surely
> the entire 'while(npages)' should be holding the mmap_sem? Or at least
> each single iteration.. Maybe the loop needs to be moved into a
> function.
> 
> >  			if (vma_list && !is_vm_hugetlb_page(vma_list[i]))
> >  				umem->hugetlb = 0;
> 
> But, what does this even get used for? Only bnxt_re and i40iw_verbs
> even read this output, and I'm rather skeptical that those drivers are
> using it in a proper way.
> 
> The code in drivers/infiniband/hw/bnxt_re/ib_verbs.c around this sure
> looks wrong.
> 
> And I have no idea what the code in i40iw_set_hugetlb_values() thinks
> it is doing calling find_vma() without locking, or using that result
> to make assumptions about the whole page array. That is clearly wrong
> too.
>

Yes. The lock needs to held across find_vma() and thats a bug.
But not sure why drivers cannot rely on the umem->hugetlb to determine
if MR is composed of huge pages. All the VMAs w.r.t the user-space buffer
are scanned to make this determination in ib_umem_get. The umem->hugetlb
flag is checked once in i40iw during MR creation.

Shiraz   
 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux