On 5/26/21 7:48 PM, Mina Almasry wrote: > On Wed, May 26, 2021 at 4:19 PM Mina Almasry <almasrymina@xxxxxxxxxx> wrote: >> >> On Wed, May 26, 2021 at 10:17 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: >>> >>> On 5/25/21 8:19 PM, Muchun Song wrote: >>>> On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: >>>>> >>>>> Here is a modification to the reservation tracking for fixup on errors. >>>>> It is a more general change, but should work for the hugetlb_mcopy_pte_atomic >>>>> case as well. >>>>> >>>>> Perhaps use this as a prerequisite for your fix(es)? Pretty sure this >>>>> will eliminate the need for the call to hugetlb_unreserve_pages. >>>> >>>> Hi Mike, >>>> >>>> It seems like someone is fixing a bug, right? Maybe a link should be >>>> placed in the cover letter so that someone can know what issue >>>> we are facing. >>>> >>> >>> Thanks Muchun, >>> >>> I wanted to first see if these patches would work in the code Mina is >>> modifying. If this works for Mina, then a more formal patch and request >>> for inclusion will be sent. >>> >> >> So a quick test: I apply my patche and yours on top of linus/master, >> and I remove the hugetlb_unreserve_pages() call that triggered this >> conversation, and run the userfaultfd test, resv_huge_pages underflows >> again, so it seems on the surface this doesn't quite work as is. >> >> Not quite sure what to do off the top of my head. I think I will try >> to debug why the 3 patches don't work together and I will fix either >> your patch or mine. I haven't taken a deep look yet; I just ran a >> quick test. >> > > Ok found the issue. With the setup I described above, the > hugetlb_shared test case passes: > > ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2 > /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > The non-shared test case is the one that underflows: > > ./tools/testing/selftests/vm/userfaultfd hugetlb 10 2 > /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > I've debugged a bit, and this messy hunk 'fixes' the underflow with > the non-shared case. (Sorry for the messiness). > > @@ -2329,17 +2340,14 @@ void restore_reserve_on_error(struct hstate > *h, struct vm_area_struct *vma, > */ > SetHPageRestoreRsvCnt(page); > } else { > - rc = vma_needs_reservation(h, vma, address); > - if (rc < 0) > - /* > - * See above comment about rare out of > - * memory condition. > - */ > - SetHPageRestoreRsvCnt(page); > - else if (rc) > - vma_add_reservation(h, vma, address); > - else > - vma_end_reservation(h, vma, address); > + resv = inode_resv_map(vma->vm_file->f_mapping->host); > + if (resv) { > + int chg = region_del(resv, idx, idx+1); > + VM_BUG_ON(chg); > + } > > The reason being is that on page allocation we region_add() an entry > into the resv_map regardless of whether this is a shared mapping or > not (vma_needs_reservation() + vma_commit_reservation(), which amounts > to region_add() at the end of the day). > > To unroll back this change on error, we need to region_del() the region_add(). > > The code removed above doesn't end up calling region_del(), because > vma_needs_reservation() returns 0, because region_chg() sees there is > an entry in the resv_map, and returns 0. > > The VM_BUG_ON() is just because I'm not sure how to handle that error. > Thanks Mina! Yes, that new block of code in restore_reserve_on_error is incorrect for the private mapping case. Since alloc_huge_page does the region_add for both shared and private mappings, it seems we should just do the region_del for both. I'll update this patch to fix this and take your other comments into account. -- Mike Kravetz