On Mon, May 31, 2021 at 4:25 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, 27 May 2021 17:50:29 -0700 Mina Almasry <almasrymina@xxxxxxxxxx> wrote: > > > On UFFDIO_COPY, if we fail to copy the page contents while holding the > > hugetlb_fault_mutex, we will drop the mutex and return to the caller > > after allocating a page that consumed a reservation. In this case there > > may be a fault that double consumes the reservation. To handle this, we > > free the allocated page, fix the reservations, and allocate a temporary > > hugetlb page and return that to the caller. When the caller does the > > copy outside of the lock, we again check the cache, and allocate a page > > consuming the reservation, and copy over the contents. > > > > Test: > > Hacked the code locally such that resv_huge_pages underflows produce > > a warning and the copy_huge_page_from_user() always fails, then: > > > > ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 > > 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > ./tools/testing/selftests/vm/userfaultfd hugetlb 10 > > 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > > > Both tests succeed and produce no warnings. After the > > test runs number of free/resv hugepages is correct. > > Many conflicts here with material that is queued for 5.14-rc1. > > How serious is this problem? Is a -stable backport warranted? > I've sent 2 similar patches to the list: 1. "[PATCH v4] mm, hugetlb: Fix simple resv_huge_pages underflow on UFFDIO_COPY" This one is sent to -stable and linux-mm and is a fairly simple fix. 2. "[PATCH v4] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY" Which is this patch. It's a more complicated and not critical fix, so not targeted for -stable. It's only sent to linux-mm. > If we decide to get this into 5.13 (and perhaps -stable) then I can > take a look at reworking all the 5.14 material on top. If not very > serious then we could rework this on top of the already queued > material. > I assume given the above we want to rework this on top of the already queued material. I can upload a v5 that is rebased on top of your branch. Note that you have an earlier version of this fix in your branch, so really this patch will turn into a fix for that patch if I rebase it (I assume that's fine).