The patch titled Subject: mm/hugetlb: don't map folios writable without VM_WRITE when copying during fork() has been added to the -mm mm-unstable branch. Its filename is mm-hugetlb-dont-map-folios-writable-without-vm_write-when-copying-during-fork.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-hugetlb-dont-map-folios-writable-without-vm_write-when-copying-during-fork.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: David Hildenbrand <david@xxxxxxxxxx> Subject: mm/hugetlb: don't map folios writable without VM_WRITE when copying during fork() Date: Wed, 4 Dec 2024 16:31:00 +0100 If we have to trigger a hugetlb folio copy during fork() because the anon folio might be pinned, we currently unconditionally create a writable PTE. However, the VMA might not have write permissions (VM_WRITE) at that point. Fix it by checking the VMA for VM_WRITE. Make the code less error prone by moving checking for VM_WRITE into make_huge_pte(), and letting callers only specify whether we should try making it writable. A simple reproducer that longterm-pins the folios using liburing to then mprotect(PROT_READ) the folios befor fork() [1] results in: Before: [FAIL] access should not have worked After: [PASS] access did not work as expected [1] https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/reproducers/hugetlb-mkwrite-fork.c This is rather a corner case, so stable might not be warranted. Link: https://lkml.kernel.org/r/20241204153100.1967364-1-david@xxxxxxxxxx Fixes: 4eae4efa2c29 ("hugetlb: do early cow when page pinned on src mm") Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> Acked-by: Peter Xu <peterx@xxxxxxxxxx> Cc: Muchun Song <muchun.song@xxxxxxxxx> Cc: Guillaume Morin <guillaume@xxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/hugetlb.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) --- a/mm/hugetlb.c~mm-hugetlb-dont-map-folios-writable-without-vm_write-when-copying-during-fork +++ a/mm/hugetlb.c @@ -5155,12 +5155,12 @@ const struct vm_operations_struct hugetl }; static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, - int writable) + bool try_mkwrite) { pte_t entry; unsigned int shift = huge_page_shift(hstate_vma(vma)); - if (writable) { + if (try_mkwrite && (vma->vm_flags & VM_WRITE)) { entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page, vma->vm_page_prot))); } else { @@ -5220,7 +5220,7 @@ static void hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr, struct folio *new_folio, pte_t old, unsigned long sz) { - pte_t newpte = make_huge_pte(vma, &new_folio->page, 1); + pte_t newpte = make_huge_pte(vma, &new_folio->page, true); __folio_mark_uptodate(new_folio); hugetlb_add_new_anon_rmap(new_folio, vma, addr); @@ -6239,8 +6239,7 @@ static vm_fault_t hugetlb_no_page(struct hugetlb_add_new_anon_rmap(folio, vma, vmf->address); else hugetlb_add_file_rmap(folio); - new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE) - && (vma->vm_flags & VM_SHARED))); + new_pte = make_huge_pte(vma, &folio->page, vma->vm_flags & VM_SHARED); /* * If this pte was previously wr-protected, keep it wr-protected even * if populated. @@ -6572,7 +6571,6 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_ spinlock_t *ptl; int ret = -ENOMEM; struct folio *folio; - int writable; bool folio_in_pagecache = false; if (uffd_flags_mode_is(flags, MFILL_ATOMIC_POISON)) { @@ -6727,12 +6725,8 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_ * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY * with wp flag set, don't set pte write bit. */ - if (wp_enabled || (is_continue && !vm_shared)) - writable = 0; - else - writable = dst_vma->vm_flags & VM_WRITE; - - _dst_pte = make_huge_pte(dst_vma, &folio->page, writable); + _dst_pte = make_huge_pte(dst_vma, &folio->page, + !wp_enabled && !(is_continue && !vm_shared)); /* * Always mark UFFDIO_COPY page dirty; note that this may not be * extremely important for hugetlbfs for now since swapping is not _ Patches currently in -mm which might be from david@xxxxxxxxxx are mm-mempolicy-fix-migrate_to_node-assuming-there-is-at-least-one-vma-in-a-mm.patch mm-filemap-dont-call-folio_test_locked-without-a-reference-in-next_uptodate_folio.patch docs-tmpfs-update-the-large-folios-policy-for-tmpfs-and-shmem.patch mm-memory_hotplug-move-debug_pagealloc_map_pages-into-online_pages_range.patch mm-page_isolation-dont-pass-gfp-flags-to-isolate_single_pageblock.patch mm-page_isolation-dont-pass-gfp-flags-to-start_isolate_page_range.patch mm-page_alloc-make-__alloc_contig_migrate_range-static.patch mm-page_alloc-sort-out-the-alloc_contig_range-gfp-flags-mess.patch mm-page_alloc-forward-the-gfp-flags-from-alloc_contig_range-to-post_alloc_hook.patch powernv-memtrace-use-__gfp_zero-with-alloc_contig_pages.patch mm-hugetlb-dont-map-folios-writable-without-vm_write-when-copying-during-fork.patch