On Wed, May 26, 2021 at 4:19 PM Mina Almasry <almasrymina@xxxxxxxxxx> wrote: > > On Wed, May 26, 2021 at 10:17 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > > > On 5/25/21 8:19 PM, Muchun Song wrote: > > > On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > >> > > >> Here is a modification to the reservation tracking for fixup on errors. > > >> It is a more general change, but should work for the hugetlb_mcopy_pte_atomic > > >> case as well. > > >> > > >> Perhaps use this as a prerequisite for your fix(es)? Pretty sure this > > >> will eliminate the need for the call to hugetlb_unreserve_pages. > > > > > > Hi Mike, > > > > > > It seems like someone is fixing a bug, right? Maybe a link should be > > > placed in the cover letter so that someone can know what issue > > > we are facing. > > > > > > > Thanks Muchun, > > > > I wanted to first see if these patches would work in the code Mina is > > modifying. If this works for Mina, then a more formal patch and request > > for inclusion will be sent. > > > > So a quick test: I apply my patche and yours on top of linus/master, > and I remove the hugetlb_unreserve_pages() call that triggered this > conversation, and run the userfaultfd test, resv_huge_pages underflows > again, so it seems on the surface this doesn't quite work as is. > > Not quite sure what to do off the top of my head. I think I will try > to debug why the 3 patches don't work together and I will fix either > your patch or mine. I haven't taken a deep look yet; I just ran a > quick test. > Ok found the issue. With the setup I described above, the hugetlb_shared test case passes: ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success The non-shared test case is the one that underflows: ./tools/testing/selftests/vm/userfaultfd hugetlb 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success I've debugged a bit, and this messy hunk 'fixes' the underflow with the non-shared case. (Sorry for the messiness). @@ -2329,17 +2340,14 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, */ SetHPageRestoreRsvCnt(page); } else { - rc = vma_needs_reservation(h, vma, address); - if (rc < 0) - /* - * See above comment about rare out of - * memory condition. - */ - SetHPageRestoreRsvCnt(page); - else if (rc) - vma_add_reservation(h, vma, address); - else - vma_end_reservation(h, vma, address); + resv = inode_resv_map(vma->vm_file->f_mapping->host); + if (resv) { + int chg = region_del(resv, idx, idx+1); + VM_BUG_ON(chg); + } The reason being is that on page allocation we region_add() an entry into the resv_map regardless of whether this is a shared mapping or not (vma_needs_reservation() + vma_commit_reservation(), which amounts to region_add() at the end of the day). To unroll back this change on error, we need to region_del() the region_add(). The code removed above doesn't end up calling region_del(), because vma_needs_reservation() returns 0, because region_chg() sees there is an entry in the resv_map, and returns 0. The VM_BUG_ON() is just because I'm not sure how to handle that error. > > I believe this issue has existed since the introduction of hugetlb > > reservations in v2.6.18. Since the bug only shows up when we take error > > paths, the issue may not have been observed. Mina found a similar issue > > in an error path which could also expose this issue. > > -- > > Mike Kravetz