On Fri, Apr 12, 2024 at 8:19 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Fri, Apr 12, 2024 at 09:53:29AM -0500, Suren Baghdasaryan wrote: > > Unless vmf_anon_prepare() already explains why vma->anon_vma poses a > > problem for per-vma locks, we should have an explanation there. This > > comment would serve that purpose IMO. > > I'll do you one better; here's some nice kernel-doc for > vmd_anon_prepare(): I was looking at the find_tcp_vma(), which seems to be the only other place where lock_vma_under_rcu() is currently used. I think it's used there only for file-backed pages, so I don't think your change affects that usecase but this makes me think that we should have some kind of a warning for lock_vma_under_rcu() future users... Maybe your addition of mmap_assert_locked() inside __anon_vma_prepare() is enough. Please don't forget to include that assertion into your final patch. > > commit f89a1cd17f13 > Author: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > Date: Fri Apr 12 10:41:02 2024 -0400 > > mm: Delay the check for a NULL anon_vma > > Instead of checking the anon_vma early in the fault path where all page > faults pay the cost, delay it until we know we're going to need the > anon_vma to be filled in. This will have a slight negative effect on the > first fault in an anonymous VMA, but it shortens every other page fault. > It also makes the code slightly cleaner as the anon and file backed > fault handling look more similar. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index d8d2ed80b0bf..718f91f74a48 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1057,11 +1057,13 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) > gfp_t gfp; > struct folio *folio; > unsigned long haddr = vmf->address & HPAGE_PMD_MASK; > + vm_fault_t ret; > > if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER)) > return VM_FAULT_FALLBACK; > - if (unlikely(anon_vma_prepare(vma))) > - return VM_FAULT_OOM; > + ret = vmf_anon_prepare(vmf); > + if (ret) > + return ret; > khugepaged_enter_vma(vma, vma->vm_flags); > > if (!(vmf->flags & FAULT_FLAG_WRITE) && > diff --git a/mm/memory.c b/mm/memory.c > index 6e2fe960473d..46b509c3bbc1 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3213,6 +3213,21 @@ static inline vm_fault_t vmf_can_call_fault(const struct vm_fault *vmf) > return VM_FAULT_RETRY; > } > > +/** > + * vmf_anon_prepare - Prepare to handle an anonymous fault. > + * @vmf: The vm_fault descriptor passed from the fault handler. > + * > + * When preparing to insert an anonymous page into a VMA from a > + * fault handler, call this function rather than anon_vma_prepare(). > + * If this vma does not already have an associated anon_vma and we are > + * only protected by the per-VMA lock, the caller must retry with the > + * mmap_lock held. __anon_vma_prepare() will look at adjacent VMAs to > + * determine if this VMA can share its anon_vma, and that's not safe to > + * do with only the per-VMA lock held for this VMA. > + * > + * Return: 0 if fault handling can proceed. Any other value should be > + * returned to the caller. > + */ > vm_fault_t vmf_anon_prepare(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > @@ -4437,8 +4452,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) > } > > /* Allocate our own private page. */ > - if (unlikely(anon_vma_prepare(vma))) > - goto oom; > + ret = vmf_anon_prepare(vmf); > + if (ret) > + return ret; > /* Returns NULL on OOM or ERR_PTR(-EAGAIN) if we must retry the fault */ > folio = alloc_anon_folio(vmf); > if (IS_ERR(folio)) > @@ -5821,15 +5837,6 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > if (!vma_start_read(vma)) > goto inval; > > - /* > - * find_mergeable_anon_vma uses adjacent vmas which are not locked. > - * This check must happen after vma_start_read(); otherwise, a > - * concurrent mremap() with MREMAP_DONTUNMAP could dissociate the VMA > - * from its anon_vma. > - */ > - if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma)) > - goto inval_end_read; > - > /* Check since vm_start/vm_end might change before we lock the VMA */ > if (unlikely(address < vma->vm_start || address >= vma->vm_end)) > goto inval_end_read;