On Fri, Jul 5, 2024 at 1:27 PM Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> wrote: > > On Thu, Jul 04, 2024 at 02:27:13PM GMT, Liam R. Howlett wrote: > > From: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx> > > > > Set the start and end address for munmap when the prev and next are > > gathered. This is needed to avoid incorrect addresses being used during > > the vms_complete_munmap_vmas() function if the prev/next vma are > > expanded. > > When we spoke about this separately you mentioned that specific arches may > be more likely to encounter this issue, perhaps worth mentioning something > about that in the commit msg? Unless I misunderstood you. > > > > > Add a new helper vms_complete_pte_clear(), which is needed later and > > will avoid growing the argument list to unmap_region() beyond the 9 it > > already has. > > My word. > > > > > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> > > --- > > mm/internal.h | 2 ++ > > mm/mmap.c | 34 +++++++++++++++++++++++++++------- > > 2 files changed, 29 insertions(+), 7 deletions(-) > > > > diff --git a/mm/internal.h b/mm/internal.h > > index 8cbbbe7d40f3..4c9f06669cc4 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -1493,6 +1493,8 @@ struct vma_munmap_struct { > > struct list_head *uf; /* Userfaultfd list_head */ > > unsigned long start; /* Aligned start addr */ > > unsigned long end; /* Aligned end addr */ > > + unsigned long unmap_start; > > + unsigned long unmap_end; > > int vma_count; /* Number of vmas that will be removed */ > > unsigned long nr_pages; /* Number of pages being removed */ > > unsigned long locked_vm; /* Number of locked pages */ > > diff --git a/mm/mmap.c b/mm/mmap.c > > index ecf55d32e804..45443a53be76 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -525,6 +525,8 @@ static inline void init_vma_munmap(struct vma_munmap_struct *vms, > > vms->vma_count = 0; > > vms->nr_pages = vms->locked_vm = vms->nr_accounted = 0; > > vms->exec_vm = vms->stack_vm = vms->data_vm = 0; > > + vms->unmap_start = FIRST_USER_ADDRESS; > > + vms->unmap_end = USER_PGTABLES_CEILING; > > } > > > > /* > > @@ -2610,6 +2612,26 @@ static inline void abort_munmap_vmas(struct ma_state *mas_detach) > > __mt_destroy(mas_detach->tree); > > } > > > > + > > +static void vms_complete_pte_clear(struct vma_munmap_struct *vms, > > + struct ma_state *mas_detach, bool mm_wr_locked) > > +{ > > + struct mmu_gather tlb; > > + > > + /* > > + * We can free page tables without write-locking mmap_lock because VMAs > > + * were isolated before we downgraded mmap_lock. > > + */ > > + mas_set(mas_detach, 1); > > + lru_add_drain(); > > + tlb_gather_mmu(&tlb, vms->mm); > > + update_hiwater_rss(vms->mm); > > + unmap_vmas(&tlb, mas_detach, vms->vma, vms->start, vms->end, vms->vma_count, mm_wr_locked); > > + mas_set(mas_detach, 1); > > I know it's necessary as unmap_vmas() will adjust mas_detach, but it kind > of aesthetically sucks to set it to 1, do some stuff, then set it to 1 > again. But this is not a big deal :>) > > > + free_pgtables(&tlb, mas_detach, vms->vma, vms->unmap_start, vms->unmap_end, mm_wr_locked); > > Yeah this bit definitely needs a comment I think, this is very confusing > indeed. Under what circumstances will these differ from [vms->start, > vms->end), etc.? > > I'm guessing it's to do with !vms->prev and !vms->next needing to be set to > [FIRST_USER_ADDRESS, USER_PGTABLES_CEILING)? > > > + tlb_finish_mmu(&tlb); > > +} > > + > > /* > > * vms_complete_munmap_vmas() - Finish the munmap() operation > > * @vms: The vma munmap struct > > @@ -2631,13 +2653,7 @@ static void vms_complete_munmap_vmas(struct vma_munmap_struct *vms, > > if (vms->unlock) > > mmap_write_downgrade(mm); > > > > - /* > > - * We can free page tables without write-locking mmap_lock because VMAs > > - * were isolated before we downgraded mmap_lock. > > - */ > > - mas_set(mas_detach, 1); > > - unmap_region(mm, mas_detach, vms->vma, vms->prev, vms->next, > > - vms->start, vms->end, vms->vma_count, !vms->unlock); > > + vms_complete_pte_clear(vms, mas_detach, !vms->unlock); > > /* Update high watermark before we lower total_vm */ > > update_hiwater_vm(mm); > > /* Stat accounting */ > > @@ -2699,6 +2715,8 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms, > > goto start_split_failed; > > } > > vms->prev = vma_prev(vms->vmi); > > + if (vms->prev) > > + vms->unmap_start = vms->prev->vm_end; > > > > /* > > * Detach a range of VMAs from the mm. Using next as a temp variable as > > @@ -2757,6 +2775,8 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms, > > } > > > > vms->next = vma_next(vms->vmi); > > + if (vms->next) > > + vms->unmap_end = vms->next->vm_start; > > > > #if defined(CONFIG_DEBUG_VM_MAPLE_TREE) > > /* Make sure no VMAs are about to be lost. */ > > -- > > 2.43.0 > > > > Other than wanting some extra comments, this looks fine and I know how > hard-won the unmap range bit of this change was so: > > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> Ok, another case when code duplication will be removed in the next patch. LGTM. Reviewed-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>