Re: [PATCH rdma-next 2/3] IB/umem: Drop hugetlb variable and VMA allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 20, 2017 at 09:19:36PM -0500, Shiraz Saleem wrote:
> On Sun, Mar 19, 2017 at 10:55:56AM +0200, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> >
> > IB umem structure has field hugetlb which was assigned, but except
> > i40iw driver, it was actually never used.
> >
> > This patch drops that variable out of the struct ib_umem, removes
> > all writes to it, get rids of VMA allocation which wasn't used and
> > removes check hugetlb from i40iw driver, because it is checked anyway in
> > i40iw_set_hugetlb_values() routine.
> >
> > @@ -143,9 +141,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> >
> >  	umem->odp_data = NULL;
> >
> > -	/* We assume the memory is from hugetlb until proved otherwise */
> > -	umem->hugetlb   = 1;
> > -
> >  	page_list = (struct page **) __get_free_page(GFP_KERNEL);
> >  	if (!page_list) {
> >  		put_pid(umem->pid);
> > @@ -153,14 +148,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> >  		return ERR_PTR(-ENOMEM);
> >  	}
> >
> > -	/*
> > -	 * if we can't alloc the vma_list, it's not so bad;
> > -	 * just assume the memory is not hugetlb memory
> > -	 */
> > -	vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL);
> > -	if (!vma_list)
> > -		umem->hugetlb = 0;
> > -
> >  	npages = ib_umem_num_pages(umem);
> >
> >  	down_write(&current->mm->mmap_sem);
> > @@ -194,7 +181,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> >  		ret = get_user_pages(cur_base,
> >  				     min_t(unsigned long, npages,
> >  					   PAGE_SIZE / sizeof (struct page *)),
> > -				     gup_flags, page_list, vma_list);
> > +				     gup_flags, page_list, NULL);
> >
> >  		if (ret < 0)
> >  			goto out;
> > @@ -203,12 +190,8 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> >  		cur_base += ret * PAGE_SIZE;
> >  		npages   -= ret;
> >
> > -		for_each_sg(sg_list_start, sg, ret, i) {
> > -			if (vma_list && !is_vm_hugetlb_page(vma_list[i]))
> > -				umem->hugetlb = 0;
> > -
> > +		for_each_sg(sg_list_start, sg, ret, i)
> >  			sg_set_page(sg, page_list[i], PAGE_SIZE, 0);
> > -		}
> >
> >  		/* preparing for next loop */
> >  		sg_list_start = sg;
>
> > diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> > index 9b2849979756..59c8d74e3704 100644
> > --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> > +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> > @@ -1850,7 +1850,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
> >  	iwmr->page_size = region->page_size;
> >  	iwmr->page_msk = PAGE_MASK;
> >
> > -	if (region->hugetlb && (req.reg_type == IW_MEMREG_TYPE_MEM))
> > +	if (req.reg_type == IW_MEMREG_TYPE_MEM)
> >  		i40iw_set_hugetlb_values(start, iwmr);
>
> The code in ib_umem_get iterates through __all__ VMA’s corresponding to the
> user-space buffer and hugetlb field of the umem struct is set only if they
> are all using huge_pages. So in the essence, umem->hugetlb provides the
> driver info on whether __all__ pages of the memory region uses huge pages
> or not.
>
> i40iw_set_hugetlb_values only finds the VMA w.r.t the start virtual address of
> the user-space buffer and checks if __this__ VMA uses hugetlb pages or not.
> It is a sanity check done on the VMA found.

>
> So the question is -- Do we know that only a single VMA represents the user-space
> buffer backed by huge pages?
> If not, it doesnt appear that the check in i40iw_set_hugetlb_values will suffice to
> identify if the entire memory region is composed of huge pages since it doesnt
> check all the VMAs. umem->hugetlb would be required for it.

You are right and thank you for pointing that. User memory buffer indeed can
span more than one VMA. For example one reg_mr with READ property for
two adjacent VAMs, one with READ properties and another with READ-WRITE.

However, relying on hugetlb which was set during creation is not best
thing too. It can be changed during the application run. The mlx5 relies on
the mlx5_ib_cont_pages() function to understand the actual layout.

Thanks

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux