On 8/5/24 14:13, Lorenzo Stoakes wrote: > In mmap_region() and do_brk_flags() we open code scenarios where we prefer > to use vma_expand() rather than invoke a full vma_merge() operation. > > Abstract this logic and eliminate all of the open-coding, and also use the > same logic for all cases where we add new VMAs to, rather than ultimately > use vma_merge(), rather use vma_expand(). > > We implement this by replacing vma_merge_new_vma() with this newly > abstracted logic. > > Doing so removes duplication and simplifies VMA merging in all such cases, > laying the ground for us to eliminate the merging of new VMAs in > vma_merge() altogether. > > This makes it far easier to understand what is happening in these cases > avoiding confusion, bugs and allowing for future optimisation. > > As a result of this change we are also able to make vma_prepare(), > init_vma_prep(), vma_complete(), can_vma_merge_before() and > can_vma_merge_after() static and internal to vma.c. That's really great, but it would be even better if these code moves could be a separate patch as it would make reviewing so much easier. But with git diff's --color-moved to the rescue, let me try... > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> > --- > mm/mmap.c | 79 ++--- > mm/vma.c | 482 +++++++++++++++++++------------ > mm/vma.h | 51 +--- > tools/testing/vma/vma_internal.h | 6 + > 4 files changed, 324 insertions(+), 294 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index f6593a81f73d..c03f50f46396 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1363,8 +1363,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > { > struct mm_struct *mm = current->mm; > struct vm_area_struct *vma = NULL; > - struct vm_area_struct *next, *prev, *merge; > - pgoff_t pglen = len >> PAGE_SHIFT; > + struct vm_area_struct *merge; > unsigned long charged = 0; > unsigned long end = addr + len; > bool writable_file_mapping = false; > @@ -1411,44 +1410,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > vm_flags |= VM_ACCOUNT; > } > > - next = vmg.next = vma_next(&vmi); > - prev = vmg.prev = vma_prev(&vmi); > - if (vm_flags & VM_SPECIAL) { > - if (prev) > - vma_iter_next_range(&vmi); > - goto cannot_expand; > - } > - > - /* Attempt to expand an old mapping */ > - /* Check next */ > - if (next && next->vm_start == end && can_vma_merge_before(&vmg)) { > - /* We can adjust this as can_vma_merge_after() doesn't touch */ > - vmg.end = next->vm_end; > - vma = vmg.vma = next; > - vmg.pgoff = next->vm_pgoff - pglen; > - > - /* We may merge our NULL anon_vma with non-NULL in next. */ > - vmg.anon_vma = vma->anon_vma; > - } > - > - /* Check prev */ > - if (prev && prev->vm_end == addr && can_vma_merge_after(&vmg)) { > - vmg.start = prev->vm_start; > - vma = vmg.vma = prev; > - vmg.pgoff = prev->vm_pgoff; > - } else if (prev) { > - vma_iter_next_range(&vmi); > - } > - > - /* Actually expand, if possible */ > - if (vma && !vma_expand(&vmg)) { > - khugepaged_enter_vma(vma, vm_flags); > + vma = vma_merge_new_vma(&vmg); > + if (vma) > goto expanded; > - } > - > - if (vma == prev) > - vma_iter_set(&vmi, addr); > -cannot_expand: > > /* > * Determine the object being mapped and call the appropriate > @@ -1493,10 +1457,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > * If vm_flags changed after call_mmap(), we should try merge > * vma again as we may succeed this time. > */ > - if (unlikely(vm_flags != vma->vm_flags && prev)) { > - merge = vma_merge_new_vma_wrapper(&vmi, prev, vma, > - vma->vm_start, vma->vm_end, > - vma->vm_pgoff); > + if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) { > + merge = vma_merge_new_vma(&vmg); Can this even succeed if we don't update vmg->vm_flags? Previously the wrapper would take them from vma. > + > if (merge) { > /* > * ->mmap() can change vma->vm_file and fput <snip> > +/* > + * vma_merge_new_vma - Attempt to merge a new VMA into address space > + * > + * @vmg: Describes the VMA we are adding, in the range @vmg->start to @vmg->end > + * (exclusive), which we try to merge with any adjacent VMAs if possible. > + * > + * We are about to add a VMA to the address space starting at @vmg->start and > + * ending at @vmg->end. There are three different possible scenarios: > + * > + * 1. There is a VMA with identical properties immediately adjacent to the > + * proposed new VMA [@vmg->start, @vmg->end) either before or after it - > + * EXPAND that VMA: > + * > + * Proposed: |-----| or |-----| > + * Existing: |----| |----| > + * > + * 2. There are VMAs with identical properties immediately adjacent to the > + * proposed new VMA [@vmg->start, @vmg->end) both before AND after it - > + * EXPAND the former and REMOVE the latter: > + * > + * Proposed: |-----| > + * Existing: |----| |----| > + * > + * 3. There are no VMAs immediately adjacent to the proposed new VMA or those > + * VMAs do not have identical attributes - NO MERGE POSSIBLE. > + * > + * In instances where we can merge, this function returns the expanded VMA which > + * will have its range adjusted accordingly and the underlying maple tree also > + * adjusted. > + * > + * Returns: In instances where no merge was possible, NULL. Otherwise, a pointer > + * to the VMA we expanded. > + * > + * This function also adjusts @vmg to provide @vmg->prev and @vmg->next if > + * neither already specified, and adjusts [@vmg->start, @vmg->end) to span the > + * expanded range. > + * > + * ASSUMPTIONS: > + * - The caller must hold a WRITE lock on the mm_struct->mmap_lock. > + * - The caller must have determined that [@vmg->start, @vmg->end) is empty. Should we be paranoid and assert something? > + */ > +struct vm_area_struct *vma_merge_new_vma(struct vma_merge_struct *vmg) > +{ > + bool is_special = vmg->flags & VM_SPECIAL; > + struct vm_area_struct *prev = vmg->prev; > + struct vm_area_struct *next = vmg->next; > + unsigned long start = vmg->start; > + unsigned long end = vmg->end; > + pgoff_t pgoff = vmg->pgoff; > + pgoff_t pglen = PHYS_PFN(end - start); > + > + VM_WARN_ON(vmg->vma); > + > + if (!prev && !next) { > + /* > + * Since the caller must have determined that the requested > + * range is empty, vmg->vmi will be left pointing at the VMA > + * immediately prior. > + */ OK that's perhaps not that obvious, as it seems copy_vma() is doing some special dance to ensure this. Should we add it to the ASSUMPTIONS and assert it, or is there a maple tree operation we can do to ensure it, ideally if it's very cheap if the iterator is already set the way we want it to be? > + next = vmg->next = vma_next(vmg->vmi); > + prev = vmg->prev = vma_prev(vmg->vmi); > + > + /* Avoid maple tree re-walk. */ > + if (is_special && prev) > + vma_iter_next_range(vmg->vmi); I wish I knew what this did but seems it's the same as the old code did so hopefully that's fine. > + } > + > + /* If special mapping or no adjacent VMAs, nothing to merge. */ > + if (is_special || (!prev && !next)) > + return NULL; > + > + /* If we can merge with the following VMA, adjust vmg accordingly. */ > + if (next && next->vm_start == end && can_vma_merge_before(vmg)) { > + /* > + * We can adjust this here as can_vma_merge_after() doesn't > + * touch vmg->end. > + */ > + vmg->end = next->vm_end; > + vmg->vma = next; > + vmg->pgoff = next->vm_pgoff - pglen; > + > + vmg->anon_vma = next->anon_vma; > + } > + > + /* If we can merge with the previous VMA, adjust vmg accordingly. */ > + if (prev && prev->vm_end == start && can_vma_merge_after(vmg)) { > + vmg->start = prev->vm_start; > + vmg->vma = prev; > + vmg->pgoff = prev->vm_pgoff; > + } else if (prev) { > + vma_iter_next_range(vmg->vmi); > + } Sigh... ditto.