Re: [PATCH v4] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Mon, 31 May 2021 16:25:27 -0700

On Thu, 27 May 2021 17:50:29 -0700 Mina Almasry <almasrymina@xxxxxxxxxx> wrote:

> On UFFDIO_COPY, if we fail to copy the page contents while holding the
> hugetlb_fault_mutex, we will drop the mutex and return to the caller
> after allocating a page that consumed a reservation. In this case there
> may be a fault that double consumes the reservation. To handle this, we
> free the allocated page, fix the reservations, and allocate a temporary
> hugetlb page and return that to the caller. When the caller does the
> copy outside of the lock, we again check the cache, and allocate a page
> consuming the reservation, and copy over the contents.
> 
> Test:
> Hacked the code locally such that resv_huge_pages underflows produce
> a warning and the copy_huge_page_from_user() always fails, then:
> 
> ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10
>         2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
> ./tools/testing/selftests/vm/userfaultfd hugetlb 10
> 	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
> 
> Both tests succeed and produce no warnings. After the
> test runs number of free/resv hugepages is correct.

Many conflicts here with material that is queued for 5.14-rc1.

How serious is this problem?  Is a -stable backport warranted?

If we decide to get this into 5.13 (and perhaps -stable) then I can
take a look at reworking all the 5.14 material on top.  If not very
serious then we could rework this on top of the already queued
material.