On Wed, Jul 04, 2018 at 08:53:54AM -0600, Jason Gunthorpe wrote: > On Wed, Jul 04, 2018 at 10:34:24AM +0300, Leon Romanovsky wrote: > > > > This is a bit wrong these days.. Now that this is > > > get_user_pages_longterm() a NULL vma_list causes it to allocate, so > > > the code before isn't right: > > > > > > /* > > > * if we can't alloc the vma_list, it's not so bad; > > > * just assume the memory is not hugetlb memory > > > */ > > > vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL); > > > if (!vma_list) > > > > > > As this now should be a hard failure. > > > > > > Arguably this should probably just split the page_list allocation in > > > half and use half for vmas and half for pages.. > > > > > > Not related to this patch, but since you are tidying this area.. > > > > get_user_pages_longterm() doesn't try to allocate whole page, but exact > > number of memory upto PAGE_SIZE, which according to code comment above can > > be less than __get_free_page() returns. However, I failed to find proof > > for that comment and it seems that __get_free_page() will always return page > > in the size of PAGE_SIZE. > > ?? The comment is just saying failure is ignored because the > computation it does is optional. "just assume the memory is not hugetlb memory" so what does it mean hugetlb in that context? > > > I would recommend to do proposed change in different patch and only > > after i40iw and bnxt_re drivers will be fixed to do not use hugetlb. > > That is another way to handle it The question is when they will update. For example, we already had similar discussion https://patchwork.kernel.org/patch/9632497/ > > > > > > > > - > > > > + up_read(¤t->mm->mmap_sem); > > > > if (ret < 0) > > > > goto out; > > > > > > > > @@ -189,12 +190,14 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, > > > > cur_base += ret * PAGE_SIZE; > > > > npages -= ret; > > > > > > > > + down_read(¤t->mm->mmap_sem); > > > > for_each_sg(sg_list_start, sg, ret, i) { > > > > > > This seems placed wrong, the up_read was only two lines higher. Surely > > > the entire 'while(npages)' should be holding the mmap_sem? Or at least > > > each single iteration.. Maybe the loop needs to be moved into a > > > function. > > > > Not really, the down_read() protects access to is_vm_hugetlb_page(vma_list[i] > > check. We don't need to protect npages because they can't be changed > > concurrently, in opposite to VMA and pages. > > Grabing and releasing a lock like that with no reason makes very > little sense. > > Further, I don't want to prove the poitners in vma_list are safe to > use after dropping the mmap_sem (they probably are not, I'm guessing), > so holding the lock for the entire loop is a better choice. The pointers are safe, the properties are not. But no problem, I'll update and resend the patch. Thanks > > Jason
Attachment:
signature.asc
Description: PGP signature