On Tue, Jul 03, 2018 at 04:14:58PM -0600, Jason Gunthorpe wrote: > On Sun, Jul 01, 2018 at 03:48:08PM +0300, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > > > DMA mapping is time consuming operation and doesn't need to be > > performed with mmap_sem semaphore is held on. Limit the scope of > > that semaphore for accounting part only. > > > > Signed-off-by: Huy Nguyen <huyn@xxxxxxxxxxxx> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > --- > > drivers/infiniband/core/umem.c | 22 +++++++++++++--------- > > 1 file changed, 13 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > > index 498f59bb4989..5c7ba60848d5 100644 > > --- a/drivers/infiniband/core/umem.c > > +++ b/drivers/infiniband/core/umem.c > > @@ -84,7 +84,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, > > struct ib_umem *umem; > > struct page **page_list; > > struct vm_area_struct **vma_list; > > - unsigned long locked; > > unsigned long lock_limit; > > unsigned long cur_base; > > unsigned long npages; > > @@ -149,15 +148,16 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, > > > > npages = ib_umem_num_pages(umem); > > > > - down_write(¤t->mm->mmap_sem); > > - > > - locked = npages + current->mm->pinned_vm; > > lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > > > > - if ((locked > lock_limit) && !capable(CAP_IPC_LOCK)) { > > + down_write(¤t->mm->mmap_sem); > > + current->mm->pinned_vm += npages; > > + if ((current->mm->pinned_vm > lock_limit) && !capable(CAP_IPC_LOCK)) { > > + up_write(¤t->mm->mmap_sem); > > ret = -ENOMEM; > > goto out; > > } > > + up_write(¤t->mm->mmap_sem); > > > > cur_base = addr & PAGE_MASK; > > > > @@ -177,11 +177,12 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, > > sg_list_start = umem->sg_head.sgl; > > > > while (npages) { > > + down_read(¤t->mm->mmap_sem); > > ret = get_user_pages_longterm(cur_base, > > min_t(unsigned long, npages, > > PAGE_SIZE / sizeof (struct page *)), > > gup_flags, page_list, vma_list); > > This is a bit wrong these days.. Now that this is > get_user_pages_longterm() a NULL vma_list causes it to allocate, so > the code before isn't right: > > /* > * if we can't alloc the vma_list, it's not so bad; > * just assume the memory is not hugetlb memory > */ > vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL); > if (!vma_list) > > As this now should be a hard failure. > > Arguably this should probably just split the page_list allocation in > half and use half for vmas and half for pages.. > > Not related to this patch, but since you are tidying this area.. get_user_pages_longterm() doesn't try to allocate whole page, but exact number of memory upto PAGE_SIZE, which according to code comment above can be less than __get_free_page() returns. However, I failed to find proof for that comment and it seems that __get_free_page() will always return page in the size of PAGE_SIZE. I would recommend to do proposed change in different patch and only after i40iw and bnxt_re drivers will be fixed to do not use hugetlb. > > > - > > + up_read(¤t->mm->mmap_sem); > > if (ret < 0) > > goto out; > > > > @@ -189,12 +190,14 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, > > cur_base += ret * PAGE_SIZE; > > npages -= ret; > > > > + down_read(¤t->mm->mmap_sem); > > for_each_sg(sg_list_start, sg, ret, i) { > > This seems placed wrong, the up_read was only two lines higher. Surely > the entire 'while(npages)' should be holding the mmap_sem? Or at least > each single iteration.. Maybe the loop needs to be moved into a > function. Not really, the down_read() protects access to is_vm_hugetlb_page(vma_list[i] check. We don't need to protect npages because they can't be changed concurrently, in opposite to VMA and pages. > > > if (vma_list && !is_vm_hugetlb_page(vma_list[i])) > > umem->hugetlb = 0; > > But, what does this even get used for? Only bnxt_re and i40iw_verbs > even read this output, and I'm rather skeptical that those drivers are > using it in a proper way. > > The code in drivers/infiniband/hw/bnxt_re/ib_verbs.c around this sure > looks wrong. > > And I have no idea what the code in i40iw_set_hugetlb_values() thinks > it is doing calling find_vma() without locking, or using that result > to make assumptions about the whole page array. That is clearly wrong > too. > > Both drivers need to use the page array physical address based > algorithm like the other drivers use. > > Arguably we should be doing a better job of this in core code, I > think. Hopefuly the two driver authors (cc'd) will fix their drivers > and we can delete the hugetlb value.. +1 > > Jason
Attachment:
signature.asc
Description: PGP signature