We forbid merging thps for uffd-wp enabled regions, by breaking the khugepaged scanning right after we detected a uffd-wp armed pte (either present, or swap). It works, but it's less efficient, because those ptes only exist for VM_UFFD_WP enabled VMAs. Checking against the vma flag would be more efficient, and good enough. To be explicit, we could still be able to merge some thps for VM_UFFD_WP regions before this patch as long as they have zero uffd-wp armed ptes, however that's not a major target for thp collapse anyways. This mostly reverts commit e1e267c7928fe387e5e1cffeafb0de2d0473663a, but instead we do the same check at vma level, so it's not a bugfix. This also paves the way for file-backed uffd-wp support, as the VM_UFFD_WP flag will work for file-backed too. After this patch, the error for khugepaged for these regions will switch from SCAN_PTE_UFFD_WP to SCAN_VMA_CHECK. Since uffd minor mode should not allow thp as well, do the same thing for minor mode to stop early on trying to collapse pages in khugepaged. Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx> Cc: Hugh Dickins <hughd@xxxxxxxxxx> Cc: Nadav Amit <nadav.amit@xxxxxxxxx> Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> --- Axel: as I asked in the other thread, please help check whether minor mode will work properly with shmem thp enabled. If not, I feel like this patch could be part of that effort at last, but it's also possible that I missed something. Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> --- include/trace/events/huge_memory.h | 1 - mm/khugepaged.c | 26 +++----------------------- 2 files changed, 3 insertions(+), 24 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 4fdb14a81108..53532f5925c3 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -15,7 +15,6 @@ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \ - EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \ EM( SCAN_PAGE_RO, "no_writable_page") \ EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \ EM( SCAN_PAGE_NULL, "page_null") \ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 045cc579f724..3afe66d48db0 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -31,7 +31,6 @@ enum scan_result { SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, SCAN_PTE_NON_PRESENT, - SCAN_PTE_UFFD_WP, SCAN_PAGE_RO, SCAN_LACK_REFERENCED_PAGE, SCAN_PAGE_NULL, @@ -467,6 +466,9 @@ static bool hugepage_vma_check(struct vm_area_struct *vma, return false; if (vma_is_temporary_stack(vma)) return false; + /* Don't allow thp merging for wp/minor enabled uffd regions */ + if (userfaultfd_wp(vma) || userfaultfd_minor(vma)) + return false; return !(vm_flags & VM_NO_KHUGEPAGED); } @@ -1246,15 +1248,6 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, pte_t pteval = *_pte; if (is_swap_pte(pteval)) { if (++unmapped <= khugepaged_max_ptes_swap) { - /* - * Always be strict with uffd-wp - * enabled swap entries. Please see - * comment below for pte_uffd_wp(). - */ - if (pte_swp_uffd_wp(pteval)) { - result = SCAN_PTE_UFFD_WP; - goto out_unmap; - } continue; } else { result = SCAN_EXCEED_SWAP_PTE; @@ -1270,19 +1263,6 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, goto out_unmap; } } - if (pte_uffd_wp(pteval)) { - /* - * Don't collapse the page if any of the small - * PTEs are armed with uffd write protection. - * Here we can also mark the new huge pmd as - * write protected if any of the small ones is - * marked but that could bring unknown - * userfault messages that falls outside of - * the registered range. So, just be simple. - */ - result = SCAN_PTE_UFFD_WP; - goto out_unmap; - } if (pte_write(pteval)) writable = true; -- 2.31.1