On Fri, Nov 24, 2023 at 09:32:13AM +0530, Aneesh Kumar K.V wrote: > Peter Xu <peterx@xxxxxxxxxx> writes: > > > Introduce per-vma begin()/end() helpers for pgtable walks. This is a > > preparation work to merge hugetlb pgtable walkers with generic mm. > > > > The helpers need to be called before and after a pgtable walk, will start > > to be needed if the pgtable walker code supports hugetlb pages. It's a > > hook point for any type of VMA, but for now only hugetlb uses it to > > stablize the pgtable pages from getting away (due to possible pmd > > unsharing). > > > > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> > > --- > > include/linux/mm.h | 3 +++ > > mm/memory.c | 12 ++++++++++++ > > 2 files changed, 15 insertions(+) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 64cd1ee4aacc..349232dd20fb 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -4154,4 +4154,7 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn) > > return range_contains_unaccepted_memory(paddr, paddr + PAGE_SIZE); > > } > > > > +void vma_pgtable_walk_begin(struct vm_area_struct *vma); > > +void vma_pgtable_walk_end(struct vm_area_struct *vma); > > + > > #endif /* _LINUX_MM_H */ > > diff --git a/mm/memory.c b/mm/memory.c > > index e27e2e5beb3f..3a6434b40d87 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -6180,3 +6180,15 @@ void ptlock_free(struct ptdesc *ptdesc) > > kmem_cache_free(page_ptl_cachep, ptdesc->ptl); > > } > > #endif > > + > > +void vma_pgtable_walk_begin(struct vm_area_struct *vma) > > +{ > > + if (is_vm_hugetlb_page(vma)) > > + hugetlb_vma_lock_read(vma); > > +} > > > > That is required only if we support pmd sharing? Correct. Note that for this specific gup code path, we're not changing the lock behavior because we used to call hugetlb_vma_lock_read() the same in hugetlb_follow_page_mask(), that's also unconditionally. It make things even more complicated if we see the recent private mapping change that Rik introduced in bf4916922c. I think it means we'll also take that lock if private lock is allocated, but I'm not really sure whether that's necessary for all pgtable walks, as the hugetlb vma lock is taken mostly in all walk paths currently, only some special paths take i_mmap rwsem instead of the vma lock. Per my current understanding, the private lock was only for avoiding a race between truncate & zapping. I had a feeling that maybe there's better way to do this rather than sticking different functions with the same lock (or, lock api). In summary, the hugetlb vma lock is still complicated and may prone to further refactoring. But all those needs further investigations. This series can be hopefully seen as completely separated from that so far. Thanks, -- Peter Xu