On Thu, Jul 4, 2024 at 11:27 AM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote: > > Create vmi_gather_munmap_vmas() to handle the gathering of vmas into a > detached maple tree for removal later. Part of the gathering is the > splitting of vmas that span the boundary. > > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> > --- > mm/mmap.c | 82 +++++++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 58 insertions(+), 24 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 411798f46932..8dc8ffbf9d8d 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2656,32 +2656,29 @@ vmi_complete_munmap_vmas(struct vma_iterator *vmi, struct vm_area_struct *vma, > } > > /* > - * do_vmi_align_munmap() - munmap the aligned region from @start to @end. > + * vmi_gather_munmap_vmas() - Put all VMAs within a range into a maple tree > + * for removal at a later date. Handles splitting first and last if necessary > + * and marking the vmas as isolated. > + * > * @vmi: The vma iterator > * @vma: The starting vm_area_struct > * @mm: The mm_struct > * @start: The aligned start address to munmap. > * @end: The aligned end address to munmap. > * @uf: The userfaultfd list_head > - * @unlock: Set to true to drop the mmap_lock. unlocking only happens on > - * success. > + * @mas_detach: The maple state tracking the detached tree > * > - * Return: 0 on success and drops the lock if so directed, error and leaves the > - * lock held otherwise. > + * Return: 0 on success > */ > static int > -do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, > +vmi_gather_munmap_vmas(struct vma_iterator *vmi, struct vm_area_struct *vma, > struct mm_struct *mm, unsigned long start, > - unsigned long end, struct list_head *uf, bool unlock) > + unsigned long end, struct list_head *uf, > + struct ma_state *mas_detach, unsigned long *locked_vm) > { > struct vm_area_struct *next = NULL; > - struct maple_tree mt_detach; > int count = 0; > int error = -ENOMEM; > - unsigned long locked_vm = 0; > - MA_STATE(mas_detach, &mt_detach, 0, 0); > - mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK); > - mt_on_stack(mt_detach); > > /* > * If we need to split any vma, do it now to save pain later. > @@ -2720,15 +2717,14 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, > goto end_split_failed; > } > vma_start_write(next); > - mas_set(&mas_detach, count); > - error = mas_store_gfp(&mas_detach, next, GFP_KERNEL); > + mas_set(mas_detach, count++); > + if (next->vm_flags & VM_LOCKED) > + *locked_vm += vma_pages(next); Uh, this was confusing. You moved locked_vm/count accounting before mas_store_gfp(), so if mas_store_gfp() fails, they will be one-off because we incremented them but failed to insert the element. Only later I realized that if mas_store_gfp() fails then we never use these counters. The code is still correct but I'm wondering if this movement was necessary. We wouldn't use wrong values but why make them wrong in the first place? In later patches you account for more things in here and all that is also done before mas_store_gfp(). Would moving all that after mas_store_gfp() and keeping them always correct be an issue? > + > + error = mas_store_gfp(mas_detach, next, GFP_KERNEL); > if (error) > goto munmap_gather_failed; > vma_mark_detached(next, true); > - if (next->vm_flags & VM_LOCKED) > - locked_vm += vma_pages(next); > - > - count++; > if (unlikely(uf)) { > /* > * If userfaultfd_unmap_prep returns an error the vmas > @@ -2753,7 +2749,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, > #if defined(CONFIG_DEBUG_VM_MAPLE_TREE) > /* Make sure no VMAs are about to be lost. */ > { > - MA_STATE(test, &mt_detach, 0, 0); > + MA_STATE(test, mas_detach->tree, 0, 0); > struct vm_area_struct *vma_mas, *vma_test; > int test_count = 0; > > @@ -2773,6 +2769,48 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, > while (vma_iter_addr(vmi) > start) > vma_iter_prev_range(vmi); > > + return 0; > + > +userfaultfd_error: > +munmap_gather_failed: > +end_split_failed: > + abort_munmap_vmas(mas_detach); > +start_split_failed: > +map_count_exceeded: > + return error; > +} > + > +/* > + * do_vmi_align_munmap() - munmap the aligned region from @start to @end. > + * @vmi: The vma iterator > + * @vma: The starting vm_area_struct > + * @mm: The mm_struct > + * @start: The aligned start address to munmap. > + * @end: The aligned end address to munmap. > + * @uf: The userfaultfd list_head > + * @unlock: Set to true to drop the mmap_lock. unlocking only happens on > + * success. > + * > + * Return: 0 on success and drops the lock if so directed, error and leaves the > + * lock held otherwise. > + */ > +static int > +do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, > + struct mm_struct *mm, unsigned long start, > + unsigned long end, struct list_head *uf, bool unlock) > +{ > + struct maple_tree mt_detach; > + MA_STATE(mas_detach, &mt_detach, 0, 0); > + mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK); > + mt_on_stack(mt_detach); > + int error; > + unsigned long locked_vm = 0; > + > + error = vmi_gather_munmap_vmas(vmi, vma, mm, start, end, uf, > + &mas_detach, &locked_vm); > + if (error) > + goto gather_failed; > + > error = vma_iter_clear_gfp(vmi, start, end, GFP_KERNEL); > if (error) > goto clear_tree_failed; > @@ -2783,12 +2821,8 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, > return 0; > > clear_tree_failed: > -userfaultfd_error: > -munmap_gather_failed: > -end_split_failed: > abort_munmap_vmas(&mas_detach); > -start_split_failed: > -map_count_exceeded: > +gather_failed: > validate_mm(mm); > return error; > } > -- > 2.43.0 >