On 3/22/21 5:49 PM, Peter Xu wrote: > Firstly, pass the wp_copy variable into hugetlb_mcopy_atomic_pte() thoughout > the stack. Then, apply the UFFD_WP bit if UFFDIO_COPY_MODE_WP is with > UFFDIO_COPY. Introduce huge_pte_mkuffd_wp() for it. > > Note that similar to how we've handled shmem, we'd better keep setting the > dirty bit even if UFFDIO_COPY_MODE_WP is provided, so that the core mm will > know this page contains valid data and never drop it. There is nothing wrong with setting the dirty bit in this manner to be consistent. But, since hugetlb pages are only managed by hugetlbfs, the core mm will not drop them. > > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> > --- > include/asm-generic/hugetlb.h | 5 +++++ > include/linux/hugetlb.h | 6 ++++-- > mm/hugetlb.c | 22 +++++++++++++++++----- > mm/userfaultfd.c | 12 ++++++++---- > 4 files changed, 34 insertions(+), 11 deletions(-) > > diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h > index 8e1e6244a89d..548212eccbd6 100644 > --- a/include/asm-generic/hugetlb.h > +++ b/include/asm-generic/hugetlb.h > @@ -27,6 +27,11 @@ static inline pte_t huge_pte_mkdirty(pte_t pte) > return pte_mkdirty(pte); > } > > +static inline pte_t huge_pte_mkuffd_wp(pte_t pte) > +{ > + return pte_mkuffd_wp(pte); > +} > + Just want to verify that userfaultfd wp support is only enabled for x86_64 now? I only ask because there are arch specific hugetlb pte manipulation routines for some architectures. > static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot) > { > return pte_modify(pte, newprot); > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index a7f7d5f328dc..ef8d2b8427b1 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -141,7 +141,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, > unsigned long dst_addr, > unsigned long src_addr, > enum mcopy_atomic_mode mode, > - struct page **pagep); > + struct page **pagep, > + bool wp_copy); > #endif /* CONFIG_USERFAULTFD */ > bool hugetlb_reserve_pages(struct inode *inode, long from, long to, > struct vm_area_struct *vma, > @@ -321,7 +322,8 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > unsigned long dst_addr, > unsigned long src_addr, > enum mcopy_atomic_mode mode, > - struct page **pagep) > + struct page **pagep, > + bool wp_copy) > { > BUG(); > return 0; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index def2c7ddf3ae..f0e55b341ebd 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4725,7 +4725,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > unsigned long dst_addr, > unsigned long src_addr, > enum mcopy_atomic_mode mode, > - struct page **pagep) > + struct page **pagep, > + bool wp_copy) > { > bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); > struct address_space *mapping; > @@ -4822,17 +4823,28 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > hugepage_add_new_anon_rmap(page, dst_vma, dst_addr); > } > > - /* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */ > - if (is_continue && !vm_shared) > + /* > + * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY > + * with wp flag set, don't set pte write bit. > + */ > + if (wp_copy || (is_continue && !vm_shared)) > writable = 0; > else > writable = dst_vma->vm_flags & VM_WRITE; > > _dst_pte = make_huge_pte(dst_vma, page, writable); > - if (writable) > - _dst_pte = huge_pte_mkdirty(_dst_pte); > + /* > + * Always mark UFFDIO_COPY page dirty; note that this may not be > + * extremely important for hugetlbfs for now since swapping is not > + * supported, but we should still be clear in that this page cannot be > + * thrown away at will, even if write bit not set. As mentioned earlier there should not be any issue with hugetlb pages being thrown away without dirty set. Perhaps, the comment should reflect that this is mostly for consistency. Note to self: this may help when I get back to hugetlb soft dirty support. Other than that, patch looks good. -- Mike Kravetz