Re: [PATCH rdma-next 2/3] IB/umem: Drop hugetlb variable and VMA allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 21, 2017 at 09:06:44AM -0500, Shiraz Saleem wrote:
> On Tue, Mar 21, 2017 at 10:14:08AM +0200, Leon Romanovsky wrote:
> > On Mon, Mar 20, 2017 at 09:19:36PM -0500, Shiraz Saleem wrote:
> > > On Sun, Mar 19, 2017 at 10:55:56AM +0200, Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > > >
> > > > IB umem structure has field hugetlb which was assigned, but except
> > > > i40iw driver, it was actually never used.
> > > >
> > > > This patch drops that variable out of the struct ib_umem, removes
> > > > all writes to it, get rids of VMA allocation which wasn't used and
> > > > removes check hugetlb from i40iw driver, because it is checked anyway in
> > > > i40iw_set_hugetlb_values() routine.
> > > >
> > > > @@ -143,9 +141,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> > > >
> > > >  	umem->odp_data = NULL;
> > > >
> > > > -	/* We assume the memory is from hugetlb until proved otherwise */
> > > > -	umem->hugetlb   = 1;
> > > > -
> > > >  	page_list = (struct page **) __get_free_page(GFP_KERNEL);
> > > >  	if (!page_list) {
> > > >  		put_pid(umem->pid);
> > > > @@ -153,14 +148,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> > > >  		return ERR_PTR(-ENOMEM);
> > > >  	}
> > > >
> > > > -	/*
> > > > -	 * if we can't alloc the vma_list, it's not so bad;
> > > > -	 * just assume the memory is not hugetlb memory
> > > > -	 */
> > > > -	vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL);
> > > > -	if (!vma_list)
> > > > -		umem->hugetlb = 0;
> > > > -
> > > >  	npages = ib_umem_num_pages(umem);
> > > >
> > > >  	down_write(&current->mm->mmap_sem);
> > > > @@ -194,7 +181,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> > > >  		ret = get_user_pages(cur_base,
> > > >  				     min_t(unsigned long, npages,
> > > >  					   PAGE_SIZE / sizeof (struct page *)),
> > > > -				     gup_flags, page_list, vma_list);
> > > > +				     gup_flags, page_list, NULL);
> > > >
> > > >  		if (ret < 0)
> > > >  			goto out;
> > > > @@ -203,12 +190,8 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> > > >  		cur_base += ret * PAGE_SIZE;
> > > >  		npages   -= ret;
> > > >
> > > > -		for_each_sg(sg_list_start, sg, ret, i) {
> > > > -			if (vma_list && !is_vm_hugetlb_page(vma_list[i]))
> > > > -				umem->hugetlb = 0;
> > > > -
> > > > +		for_each_sg(sg_list_start, sg, ret, i)
> > > >  			sg_set_page(sg, page_list[i], PAGE_SIZE, 0);
> > > > -		}
> > > >
> > > >  		/* preparing for next loop */
> > > >  		sg_list_start = sg;
> > >
> > > > diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> > > > index 9b2849979756..59c8d74e3704 100644
> > > > --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> > > > +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> > > > @@ -1850,7 +1850,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
> > > >  	iwmr->page_size = region->page_size;
> > > >  	iwmr->page_msk = PAGE_MASK;
> > > >
> > > > -	if (region->hugetlb && (req.reg_type == IW_MEMREG_TYPE_MEM))
> > > > +	if (req.reg_type == IW_MEMREG_TYPE_MEM)
> > > >  		i40iw_set_hugetlb_values(start, iwmr);
> > >
> > > The code in ib_umem_get iterates through __all__ VMA’s corresponding to the
> > > user-space buffer and hugetlb field of the umem struct is set only if they
> > > are all using huge_pages. So in the essence, umem->hugetlb provides the
> > > driver info on whether __all__ pages of the memory region uses huge pages
> > > or not.
> > >
> > > i40iw_set_hugetlb_values only finds the VMA w.r.t the start virtual address of
> > > the user-space buffer and checks if __this__ VMA uses hugetlb pages or not.
> > > It is a sanity check done on the VMA found.
> >
> > >
> > > So the question is -- Do we know that only a single VMA represents the user-space
> > > buffer backed by huge pages?
> > > If not, it doesnt appear that the check in i40iw_set_hugetlb_values will suffice to
> > > identify if the entire memory region is composed of huge pages since it doesnt
> > > check all the VMAs. umem->hugetlb would be required for it.
> >
> > You are right and thank you for pointing that. User memory buffer indeed can
> > span more than one VMA. For example one reg_mr with READ property for
> > two adjacent VAMs, one with READ properties and another with READ-WRITE.
> >
> > However, relying on hugetlb which was set during creation is not best
> > thing too. It can be changed during the application run. The mlx5 relies on
> > the mlx5_ib_cont_pages() function to understand the actual layout.
> >
>
> The patch as is, will break us.

Doug,
Please be sure do NOT take this patch.
Do you want me to resubmit the series without this patch?

> We believe ib_umem_get is the correct
> place to scan the VMAs and report if the MR is composed of hugetlb pages via the
> hugetlb field of umem. This way it can be done in one place and not in each
> individual driver.

The problem with this approach that it is done at the creation of
umem region. The call to madvise(MADV_NOHUGEPAGE, ...) will clear
the hugetlb property from VMA without change of umem->hugetlb variable.

Thanks

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux