Re: [PATCH] mm: Always sanity check anon_vma first for per-vma locks

Suren Baghdasaryan <surenb@xxxxxxxxxx> · Mon, 15 Apr 2024 08:58:28 -0700

On Fri, Apr 12, 2024 at 8:19 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Fri, Apr 12, 2024 at 09:53:29AM -0500, Suren Baghdasaryan wrote:
> > Unless vmf_anon_prepare() already explains why vma->anon_vma poses a
> > problem for per-vma locks, we should have an explanation there. This
> > comment would serve that purpose IMO.
>
> I'll do you one better; here's some nice kernel-doc for
> vmd_anon_prepare():

I was looking at the find_tcp_vma(), which seems to be the only other
place where lock_vma_under_rcu() is currently used. I think it's used
there only for file-backed pages, so I don't think your change affects
that usecase but this makes me think that we should have some kind of
a warning for lock_vma_under_rcu() future users... Maybe your addition
of mmap_assert_locked() inside __anon_vma_prepare() is enough. Please
don't forget to include that assertion into your final patch.

>
> commit f89a1cd17f13
> Author: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> Date:   Fri Apr 12 10:41:02 2024 -0400
>
>     mm: Delay the check for a NULL anon_vma
>
>     Instead of checking the anon_vma early in the fault path where all page
>     faults pay the cost, delay it until we know we're going to need the
>     anon_vma to be filled in.  This will have a slight negative effect on the
>     first fault in an anonymous VMA, but it shortens every other page fault.
>     It also makes the code slightly cleaner as the anon and file backed
>     fault handling look more similar.
>
>     Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d8d2ed80b0bf..718f91f74a48 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1057,11 +1057,13 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>         gfp_t gfp;
>         struct folio *folio;
>         unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
> +       vm_fault_t ret;
>
>         if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER))
>                 return VM_FAULT_FALLBACK;
> -       if (unlikely(anon_vma_prepare(vma)))
> -               return VM_FAULT_OOM;
> +       ret = vmf_anon_prepare(vmf);
> +       if (ret)
> +               return ret;
>         khugepaged_enter_vma(vma, vma->vm_flags);
>
>         if (!(vmf->flags & FAULT_FLAG_WRITE) &&
> diff --git a/mm/memory.c b/mm/memory.c
> index 6e2fe960473d..46b509c3bbc1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3213,6 +3213,21 @@ static inline vm_fault_t vmf_can_call_fault(const struct vm_fault *vmf)
>         return VM_FAULT_RETRY;
>  }
>
> +/**
> + * vmf_anon_prepare - Prepare to handle an anonymous fault.
> + * @vmf: The vm_fault descriptor passed from the fault handler.
> + *
> + * When preparing to insert an anonymous page into a VMA from a
> + * fault handler, call this function rather than anon_vma_prepare().
> + * If this vma does not already have an associated anon_vma and we are
> + * only protected by the per-VMA lock, the caller must retry with the
> + * mmap_lock held.  __anon_vma_prepare() will look at adjacent VMAs to
> + * determine if this VMA can share its anon_vma, and that's not safe to
> + * do with only the per-VMA lock held for this VMA.
> + *
> + * Return: 0 if fault handling can proceed.  Any other value should be
> + * returned to the caller.
> + */
>  vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
>  {
>         struct vm_area_struct *vma = vmf->vma;
> @@ -4437,8 +4452,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>         }
>
>         /* Allocate our own private page. */
> -       if (unlikely(anon_vma_prepare(vma)))
> -               goto oom;
> +       ret = vmf_anon_prepare(vmf);
> +       if (ret)
> +               return ret;
>         /* Returns NULL on OOM or ERR_PTR(-EAGAIN) if we must retry the fault */
>         folio = alloc_anon_folio(vmf);
>         if (IS_ERR(folio))
> @@ -5821,15 +5837,6 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>         if (!vma_start_read(vma))
>                 goto inval;
>
> -       /*
> -        * find_mergeable_anon_vma uses adjacent vmas which are not locked.
> -        * This check must happen after vma_start_read(); otherwise, a
> -        * concurrent mremap() with MREMAP_DONTUNMAP could dissociate the VMA
> -        * from its anon_vma.
> -        */
> -       if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma))
> -               goto inval_end_read;
> -
>         /* Check since vm_start/vm_end might change before we lock the VMA */
>         if (unlikely(address < vma->vm_start || address >= vma->vm_end))
>                 goto inval_end_read;