If we have to trigger a hugetlb folio copy during fork() because the anon folio might be pinned, we currently unconditionally create a writable PTE. However, the VMA might not have write permissions (VM_WRITE) at that point. Fix it by checking the VMA for VM_WRITE. Make the code less error prone by moving checking for VM_WRITE into make_huge_pte(), and letting callers only specify whether we should try making it writable. A simple reproducer that longter-pins the folios using liburing to then mprotect(PROT_READ) the folios befor fork() [1] results in: Before: [FAIL] access should not have worked After: [PASS] access did not work as expected [1] https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/reproducers/hugetlb-mkwrite-fork.c This is rather a corner case, so stable might not be warranted. Fixes: 4eae4efa2c29 ("hugetlb: do early cow when page pinned on src mm") Cc: Muchun Song <muchun.song@xxxxxxxxx> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Peter Xu <peterx@xxxxxxxxxx> Cc: Guillaume Morin <guillaume@xxxxxxxxxxx> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> --- mm/hugetlb.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5c8de0f5c760..6db4e8176303 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5141,12 +5141,12 @@ const struct vm_operations_struct hugetlb_vm_ops = { }; static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, - int writable) + bool try_mkwrite) { pte_t entry; unsigned int shift = huge_page_shift(hstate_vma(vma)); - if (writable) { + if (try_mkwrite && (vma->vm_flags & VM_WRITE)) { entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page, vma->vm_page_prot))); } else { @@ -5199,7 +5199,7 @@ static void hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr, struct folio *new_folio, pte_t old, unsigned long sz) { - pte_t newpte = make_huge_pte(vma, &new_folio->page, 1); + pte_t newpte = make_huge_pte(vma, &new_folio->page, true); __folio_mark_uptodate(new_folio); hugetlb_add_new_anon_rmap(new_folio, vma, addr); @@ -6223,8 +6223,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, hugetlb_add_new_anon_rmap(folio, vma, vmf->address); else hugetlb_add_file_rmap(folio); - new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE) - && (vma->vm_flags & VM_SHARED))); + new_pte = make_huge_pte(vma, &folio->page, vma->vm_flags & VM_SHARED); /* * If this pte was previously wr-protected, keep it wr-protected even * if populated. @@ -6556,7 +6555,6 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, spinlock_t *ptl; int ret = -ENOMEM; struct folio *folio; - int writable; bool folio_in_pagecache = false; if (uffd_flags_mode_is(flags, MFILL_ATOMIC_POISON)) { @@ -6710,12 +6708,8 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY * with wp flag set, don't set pte write bit. */ - if (wp_enabled || (is_continue && !vm_shared)) - writable = 0; - else - writable = dst_vma->vm_flags & VM_WRITE; - - _dst_pte = make_huge_pte(dst_vma, &folio->page, writable); + _dst_pte = make_huge_pte(dst_vma, &folio->page, + !wp_enabled && !(is_continue && !vm_shared)); /* * Always mark UFFDIO_COPY page dirty; note that this may not be * extremely important for hugetlbfs for now since swapping is not -- 2.47.1