On Wed, Jul 04, 2018 at 10:34:24AM +0300, Leon Romanovsky wrote: > > This is a bit wrong these days.. Now that this is > > get_user_pages_longterm() a NULL vma_list causes it to allocate, so > > the code before isn't right: > > > > /* > > * if we can't alloc the vma_list, it's not so bad; > > * just assume the memory is not hugetlb memory > > */ > > vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL); > > if (!vma_list) > > > > As this now should be a hard failure. > > > > Arguably this should probably just split the page_list allocation in > > half and use half for vmas and half for pages.. > > > > Not related to this patch, but since you are tidying this area.. > > get_user_pages_longterm() doesn't try to allocate whole page, but exact > number of memory upto PAGE_SIZE, which according to code comment above can > be less than __get_free_page() returns. However, I failed to find proof > for that comment and it seems that __get_free_page() will always return page > in the size of PAGE_SIZE. ?? The comment is just saying failure is ignored because the computation it does is optional. > I would recommend to do proposed change in different patch and only > after i40iw and bnxt_re drivers will be fixed to do not use hugetlb. That is another way to handle it > > > > > - > > > + up_read(¤t->mm->mmap_sem); > > > if (ret < 0) > > > goto out; > > > > > > @@ -189,12 +190,14 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, > > > cur_base += ret * PAGE_SIZE; > > > npages -= ret; > > > > > > + down_read(¤t->mm->mmap_sem); > > > for_each_sg(sg_list_start, sg, ret, i) { > > > > This seems placed wrong, the up_read was only two lines higher. Surely > > the entire 'while(npages)' should be holding the mmap_sem? Or at least > > each single iteration.. Maybe the loop needs to be moved into a > > function. > > Not really, the down_read() protects access to is_vm_hugetlb_page(vma_list[i] > check. We don't need to protect npages because they can't be changed > concurrently, in opposite to VMA and pages. Grabing and releasing a lock like that with no reason makes very little sense. Further, I don't want to prove the poitners in vma_list are safe to use after dropping the mmap_sem (they probably are not, I'm guessing), so holding the lock for the entire loop is a better choice. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html