On Thu, 27 May 2021 17:50:29 -0700 Mina Almasry <almasrymina@xxxxxxxxxx> wrote: > On UFFDIO_COPY, if we fail to copy the page contents while holding the > hugetlb_fault_mutex, we will drop the mutex and return to the caller > after allocating a page that consumed a reservation. In this case there > may be a fault that double consumes the reservation. To handle this, we > free the allocated page, fix the reservations, and allocate a temporary > hugetlb page and return that to the caller. When the caller does the > copy outside of the lock, we again check the cache, and allocate a page > consuming the reservation, and copy over the contents. > > Test: > Hacked the code locally such that resv_huge_pages underflows produce > a warning and the copy_huge_page_from_user() always fails, then: > > ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 > 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success > ./tools/testing/selftests/vm/userfaultfd hugetlb 10 > 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success > > Both tests succeed and produce no warnings. After the > test runs number of free/resv hugepages is correct. Many conflicts here with material that is queued for 5.14-rc1. How serious is this problem? Is a -stable backport warranted? If we decide to get this into 5.13 (and perhaps -stable) then I can take a look at reworking all the 5.14 material on top. If not very serious then we could rework this on top of the already queued material.