Re: [PATCH v3 11/16] mm/mmap: Track start and end of munmap in vma_munmap_struct

Suren Baghdasaryan <surenb@xxxxxxxxxx> · Wed, 10 Jul 2024 10:14:32 -0700

On Fri, Jul 5, 2024 at 1:27 PM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
>
> On Thu, Jul 04, 2024 at 02:27:13PM GMT, Liam R. Howlett wrote:
> > From: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>
> >
> > Set the start and end address for munmap when the prev and next are
> > gathered.  This is needed to avoid incorrect addresses being used during
> > the vms_complete_munmap_vmas() function if the prev/next vma are
> > expanded.
>
> When we spoke about this separately you mentioned that specific arches may
> be more likely to encounter this issue, perhaps worth mentioning something
> about that in the commit msg? Unless I misunderstood you.
>
> >
> > Add a new helper vms_complete_pte_clear(), which is needed later and
> > will avoid growing the argument list to unmap_region() beyond the 9 it
> > already has.
>
> My word.
>
> >
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
> > ---
> >  mm/internal.h |  2 ++
> >  mm/mmap.c     | 34 +++++++++++++++++++++++++++-------
> >  2 files changed, 29 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 8cbbbe7d40f3..4c9f06669cc4 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -1493,6 +1493,8 @@ struct vma_munmap_struct {
> >       struct list_head *uf;           /* Userfaultfd list_head */
> >       unsigned long start;            /* Aligned start addr */
> >       unsigned long end;              /* Aligned end addr */
> > +     unsigned long unmap_start;
> > +     unsigned long unmap_end;
> >       int vma_count;                  /* Number of vmas that will be removed */
> >       unsigned long nr_pages;         /* Number of pages being removed */
> >       unsigned long locked_vm;        /* Number of locked pages */
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index ecf55d32e804..45443a53be76 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -525,6 +525,8 @@ static inline void init_vma_munmap(struct vma_munmap_struct *vms,
> >       vms->vma_count = 0;
> >       vms->nr_pages = vms->locked_vm = vms->nr_accounted = 0;
> >       vms->exec_vm = vms->stack_vm = vms->data_vm = 0;
> > +     vms->unmap_start = FIRST_USER_ADDRESS;
> > +     vms->unmap_end = USER_PGTABLES_CEILING;
> >  }
> >
> >  /*
> > @@ -2610,6 +2612,26 @@ static inline void abort_munmap_vmas(struct ma_state *mas_detach)
> >       __mt_destroy(mas_detach->tree);
> >  }
> >
> > +
> > +static void vms_complete_pte_clear(struct vma_munmap_struct *vms,
> > +             struct ma_state *mas_detach, bool mm_wr_locked)
> > +{
> > +     struct mmu_gather tlb;
> > +
> > +     /*
> > +      * We can free page tables without write-locking mmap_lock because VMAs
> > +      * were isolated before we downgraded mmap_lock.
> > +      */
> > +     mas_set(mas_detach, 1);
> > +     lru_add_drain();
> > +     tlb_gather_mmu(&tlb, vms->mm);
> > +     update_hiwater_rss(vms->mm);
> > +     unmap_vmas(&tlb, mas_detach, vms->vma, vms->start, vms->end, vms->vma_count, mm_wr_locked);
> > +     mas_set(mas_detach, 1);
>
> I know it's necessary as unmap_vmas() will adjust mas_detach, but it kind
> of aesthetically sucks to set it to 1, do some stuff, then set it to 1
> again. But this is not a big deal :>)
>
> > +     free_pgtables(&tlb, mas_detach, vms->vma, vms->unmap_start, vms->unmap_end, mm_wr_locked);
>
> Yeah this bit definitely needs a comment I think, this is very confusing
> indeed. Under what circumstances will these differ from [vms->start,
> vms->end), etc.?
>
> I'm guessing it's to do with !vms->prev and !vms->next needing to be set to
> [FIRST_USER_ADDRESS, USER_PGTABLES_CEILING)?
>
> > +     tlb_finish_mmu(&tlb);
> > +}
> > +
> >  /*
> >   * vms_complete_munmap_vmas() - Finish the munmap() operation
> >   * @vms: The vma munmap struct
> > @@ -2631,13 +2653,7 @@ static void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
> >       if (vms->unlock)
> >               mmap_write_downgrade(mm);
> >
> > -     /*
> > -      * We can free page tables without write-locking mmap_lock because VMAs
> > -      * were isolated before we downgraded mmap_lock.
> > -      */
> > -     mas_set(mas_detach, 1);
> > -     unmap_region(mm, mas_detach, vms->vma, vms->prev, vms->next,
> > -                  vms->start, vms->end, vms->vma_count, !vms->unlock);
> > +     vms_complete_pte_clear(vms, mas_detach, !vms->unlock);
> >       /* Update high watermark before we lower total_vm */
> >       update_hiwater_vm(mm);
> >       /* Stat accounting */
> > @@ -2699,6 +2715,8 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
> >                       goto start_split_failed;
> >       }
> >       vms->prev = vma_prev(vms->vmi);
> > +     if (vms->prev)
> > +             vms->unmap_start = vms->prev->vm_end;
> >
> >       /*
> >        * Detach a range of VMAs from the mm. Using next as a temp variable as
> > @@ -2757,6 +2775,8 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
> >       }
> >
> >       vms->next = vma_next(vms->vmi);
> > +     if (vms->next)
> > +             vms->unmap_end = vms->next->vm_start;
> >
> >  #if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
> >       /* Make sure no VMAs are about to be lost. */
> > --
> > 2.43.0
> >
>
> Other than wanting some extra comments, this looks fine and I know how
> hard-won the unmap range bit of this change was so:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>

Ok, another case when code duplication will be removed in the next patch. LGTM.

Reviewed-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>