On 05 Dec 21:50, Nathan Chancellor wrote: > > Hi Guillaume and s390 folks, > > On Thu, Dec 05, 2024 at 03:02:26AM +0100, Guillaume Morin wrote: > > > > Eric reported that PTRACE_POKETEXT fails when applications use hugetlb > > for mapping text using huge pages. Before commit 1d8d14641fd9 > > ("mm/hugetlb: support write-faults in shared mappings"), PTRACE_POKETEXT > > worked by accident, but it was buggy and silently ended up mapping pages > > writable into the page tables even though VM_WRITE was not set. > > > > In general, FOLL_FORCE|FOLL_WRITE does currently not work with hugetlb. > > Let's implement FOLL_FORCE|FOLL_WRITE properly for hugetlb, such that > > what used to work in the past by accident now properly works, allowing > > applications using hugetlb for text etc. to get properly debugged. > > > > This change might also be required to implement uprobes support for > > hugetlb [1]. > > > > [1] https://lore.kernel.org/lkml/ZiK50qob9yl5e0Xz@xxxxxxxxxxxxxxxxxx/ > > > > Cc: Muchun Song <muchun.song@xxxxxxxxx> > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > Cc: Peter Xu <peterx@xxxxxxxxxx> > > Cc: David Hildenbrand <david@xxxxxxxxxx> > > Cc: Eric Hagberg <ehagberg@xxxxxxxxxxxxxx> > > Signed-off-by: Guillaume Morin <guillaume@xxxxxxxxxxx> > > --- > > Changes in v2: > > - Improved commit message > > Changes in v3: > > - Fix potential unitialized mem access in follow_huge_pud > > - define pud_soft_dirty when soft dirty is not enabled > > > > include/linux/pgtable.h | 5 +++ > > mm/gup.c | 99 +++++++++++++++++++++-------------------- > > mm/hugetlb.c | 20 +++++---- > > 3 files changed, 66 insertions(+), 58 deletions(-) > > > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > > index adef9d6e9b1b..9335d7c82d20 100644 > > --- a/include/linux/pgtable.h > > +++ b/include/linux/pgtable.h > > @@ -1422,6 +1422,11 @@ static inline int pmd_soft_dirty(pmd_t pmd) > > return 0; > > } > > > > +static inline int pud_soft_dirty(pud_t pud) > > +{ > > + return 0; > > +} > > + > > static inline pte_t pte_mksoft_dirty(pte_t pte) > > { > > return pte; > > diff --git a/mm/gup.c b/mm/gup.c > > index 746070a1d8bf..cc3eae458013 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -587,6 +587,33 @@ static struct folio *try_grab_folio_fast(struct page *page, int refs, > > } > > #endif /* CONFIG_HAVE_GUP_FAST */ > > > > +/* Common code for can_follow_write_* */ > > +static inline bool can_follow_write_common(struct page *page, > > + struct vm_area_struct *vma, unsigned int flags) > > +{ > > + /* Maybe FOLL_FORCE is set to override it? */ > > + if (!(flags & FOLL_FORCE)) > > + return false; > > + > > + /* But FOLL_FORCE has no effect on shared mappings */ > > + if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) > > + return false; > > + > > + /* ... or read-only private ones */ > > + if (!(vma->vm_flags & VM_MAYWRITE)) > > + return false; > > + > > + /* ... or already writable ones that just need to take a write fault */ > > + if (vma->vm_flags & VM_WRITE) > > + return false; > > + > > + /* > > + * See can_change_pte_writable(): we broke COW and could map the page > > + * writable if we have an exclusive anonymous page ... > > + */ > > + return page && PageAnon(page) && PageAnonExclusive(page); > > +} > > + > > static struct page *no_page_table(struct vm_area_struct *vma, > > unsigned int flags, unsigned long address) > > { > > @@ -613,6 +640,22 @@ static struct page *no_page_table(struct vm_area_struct *vma, > > } > > > > #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES > > +/* FOLL_FORCE can write to even unwritable PUDs in COW mappings. */ > > +static inline bool can_follow_write_pud(pud_t pud, struct page *page, > > + struct vm_area_struct *vma, > > + unsigned int flags) > > +{ > > + /* If the pud is writable, we can write to the page. */ > > + if (pud_write(pud)) > > + return true; > > + > > + if (!can_follow_write_common(page, vma, flags)) > > + return false; > > + > > + /* ... and a write-fault isn't required for other reasons. */ > > + return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud); > > This looks to be one of the first uses of pud_soft_dirty() in a generic > part of the tree from what I can tell, which shows that s390 is lacking > it despite setting CONFIG_HAVE_ARCH_SOFT_DIRTY: > > $ make -skj"$(nproc)" ARCH=s390 CROSS_COMPILE=s390-linux- mrproper defconfig mm/gup.o > mm/gup.c: In function 'can_follow_write_pud': > mm/gup.c:665:48: error: implicit declaration of function 'pud_soft_dirty'; did you mean 'pmd_soft_dirty'? [-Wimplicit-function-declaration] > 665 | return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud); > | ^~~~~~~~~~~~~~ > | pmd_soft_dirty > > Is this expected? Yikes! It does look like an oversight in the s390 code since as you said it has CONFIG_HAVE_ARCH_SOFT_DIRTY and pud_mkdirty seems to be setting _REGION3_ENTRY_SOFT_DIRTY. But I'll let the s390 folks opine. I don't mind dropping the pud part of the change (even if that's a bit of a shame) if it's causing too many issues. -- Guillaume Morin <guillaume@xxxxxxxxxxx>