Andrew: This is rebased on v8 of Liam's series [4], so the ordering between our series should be to merge his first and then mine on top of that. Thanks! The infamous vma_merge() function has been the cause of a great deal of pain, bugs and confusion for a very long time. It is subtle, contains many corner cases, tries to do far too much and is as a result very fragile. The fact that the function requires there to be a numbering system to cover each possible eventuality with references to each in the many branches of its implementation as to which case you are looking at speaks to all this. Some of this complexity is inherent - unfortunately there is no getting away from the need to figure out precisely how to execute the merge, whether we need to remove VMAs, whether it is safe to do so, what constitutes a mergeable VMA and so on. However, a lot of the complexity is not inherent but instead a product of the function's 'organic' development. Liam has gone to great lengths to improve the situation as a part of his maple tree implementation, greatly improving the readability of the code, and Vlastimil and myself have additionally gone to lengths to try to improve things further. However, with the availability of userland VMA testing, it now becomes possible to perform a rather more significant refactoring while maintaining confidence in its correct operation. An attempt was previously made by Vlastimil [0] to eliminate vma_merge(), however it was rather - brutal - and an astute reader might refer to the date of that patch for insight as to its intent. This series instead divides merge operations into two natural kinds - merges which occur when a NEW vma is being added to the address space, and merges which occur when a vma is being MODIFIED. Happily, the vma_expand() function introduced by Liam, which has the capacity for also deleting a subsequent VMA, covers each of the NEW vma cases. By abstracting the actual final commit of changes to a VMA to its own function, commit_merge() and writing a wrapper around vma_expand() for new VMA cases vma_merge_new_range(), we can avoid having to use vma_merge() for these instances altogether. By doing so we are also able to then de-duplicate all existing merge logic in mmap_region() and do_brk_flags() and have everything invoke this new function, so we universally take the same approach to merging new VMAs. Having done so, we can then completely rework vma_merge() into vma_merge_existing_range() and use this for the instances where a merge is proposed for a region of an existing VMA. This eliminates vma_merge() and its numbered cases and instead divides things into logical cases - merge both, merge left, merge right (the latter 2 being either partial or full merges). The code is heavily annotated with ASCII diagrams and greatly simplified in comparison to the existing vma_merge() function. Having made this change, we take the opportunity to address an issue with merging VMAs possessing a vm_ops->close() hook - commit 714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in mergeability test") and commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case 7 with vma_ops->close") make efforts to relax how we handle these, making assumptions about which VMAs might end up deleted (and thus, if possessing a vm_ops->close() hook, cannot be). This refactor means we do not need to guess, so instead explicitly only disallow merge in instances where a VMA with a vm_ops->close() hook would be deleted (and try a smaller merge in cases where this is possible). In addition to these changes, we introduce a new vma_merge_struct abstraction to allow VMA merge state to be threaded through the operation neatly. There is heavy unit testing provided for all merge functionality, added prior to the refactoring, allowing for before/after testing. The vm_ops->close() change also introduces exhaustive testing to demonstrate that this functions as expected, and in addition to this the reproduction code from commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case 7 with vma_ops->close") was tested and confirmed passing. [0]:https://lore.kernel.org/linux-mm/20240401192623.18575-2-vbabka@xxxxxxx/ [1]:https://lore.kernel.org/all/20240830040101.822209-1-Liam.Howlett@xxxxxxxxxx/ [2]:https://lore.kernel.org/linux-mm/c0ef6b6a-1c9b-4da2-a180-c8e1c73b1c28@lucifer.local/ [3]:https://lore.kernel.org/all/9dcddc2c-482b-4e12-a409-eee8d902ba26@lucifer.local/ [4]:https://lore.kernel.org/all/20240830040101.822209-1-Liam.Howlett@xxxxxxxxxx/ v3: * Rebased on Liam's v8 'Avoid MAP_FIXED gap exposure' series [1]. * Fixed issue with copy_vma() vma iterator positioning as per [2] (formerly fixed via a fix patch). * Fixed issue with vma_merge_expand() not correctly obtaining the next VMA as per [3] (formerly fixed via a fix patch) - Thanks Mark Brown! * General whitespace fixes. * Improved comments. * Added comments for bool params for clarity. * Removed unnecessary syntactic change in vma_merge(). * Removed unnecessary else from mmap_region(). * Introduced vma_iter_next_rewind(), are_anon_vmas_compatible(), can_vma_merge_left(), can_vma_merge_right(). * Cleaned up logic in vma_merge_new_range(). * Cleaned up logic in vma_merge_existing_range(). * Eliminated vma_lookup() from all VMA merge code. * Added vma_merge_extend() regression test + confirmed fails before fix + passes after. * Added copy_vma() regression test + confirmed triggers assert before fix + doesn't after. * Confirmed _all_ self-tests passing at same rate before/after changes. * Confirmed no perf impact. v2: * Updated tests to function without the vmg change, and moved earlier in series so we can test against the code _exactly_ as it was previously. * Added vmg->mm to store mm_struct and avoid hacky container_of() in vma_merge() prior to refactor. It's logical to thread this through. * Stopped specifying vmg->vma for vma_merge_new_vma() from the start, which was previously removed later in the series. * Improve vma_modify_flags() to be better formatted for a large number of flags. * Removed if (vma) { ... } logic in mmap_region() and integrated the approach from a later commit of putting logic into the if (next &&... ) block. Improved comment about why we are doing this. * Introduced VMG_STATE() and VMG_VMA_STATE() macros and use these to avoid duplication of initialisation of vmg state. * Expanded the commit message for abstracting the policy comparison to explain the logic. * Reverted the use of vmg in vma_shrink() and split_vma(). * Reverted the cleanup of __split_vma() int -> bool as at this point fully irrelevant to series. * Reinstated incorrectly removed vmg.uffd_ctx assignment in mmap_region(). * Removed a confusing comment about assignment of vmg.end in early version of mmap_region(). * Renamed vma_merge_new_vma() to vma_merge_new_range() and vma_merge_modified() to vma_merge_existing_range(). This makes it clearer what we're attempting to do. * Stopped setting vmg parameters in do_brk_flags() that we did not set in the original implementation, i.e. vma parameters for things like anon_vma, uffd context, etc. which in the original implementation are not checked in can_vma_merge_after(). * Moved VM_SPECIAL maple tree rewalk out of if (!prev && !next) { ... } block in vma_merge_new_range() (which was changed to !next anyway). This should always be done in the VM_SPECIAL case if vmg->prev is specified. * Updated vma_merge_new_range() to correct the case where prev, next could be merged individually with the proposed range, however not together. * Update vma_merge_new_range() to require that the caller sets prev and next. This simplifies the logic and avoids unnecessary maple tree walks. * Updated mmap_region() to update vmg->flags from vma->vm_flags on merge reattempt. * Updated callers of vma_merge_new_range() to ensure we always point the iterator at prev if it exists. * Added new state field to vmg to allow for errors to be returned. * Adjusted do_brk_flags() to read vmg->state and handle memory allocation failures accordingly. * Do not double-assign VM_SOFTDIRTY in do_brk_flags(). * Separated out move of vma_prepare(), init_vma_prep(), vma_complete(), can_vma_merge_before(), can_vma_merge_after() functions to separate commit. * Adjusted commit_merge() change to initially _only_ have parameters relevant to vma_expand() to make review easier. * Reinstated 'vma iterator must be pointing to start' comment in commit_merge(). * Adjusted commit_merge() again when introducing vma_merge_existing_range() to accept parameters specific to existing range merges. * Removed unnecessary abstraction of vmg->end in vma_merge_existing_range() as only used once. * Abstract expanded parameter to local variable for clarity in vma_merge_existing_range(). * Unlink anon_vma objects if VMA pre-allocation fails on commit_merge() in vma_merge_existing_range() if any were duplicated. This was incorrectly excluded from the refactor. * Moved comment from close commit regarding merge_will_delete_both to previous commit as unchanged behaviour. * Corrected failure to assign vmg->flags after applying VM_ACCOUNT in map_region() (this had caused a ~5% regression in do_brk_flags() incidentally, now resolved). * Added vmi assumptions and asserts in merge functions. * Added lock asserts in merge functions. * Added an assert to vma_merge_new_range() to ensure no VMA within [vmg->start, vmg->end). * Added additional comments describing why we are moving the iterator to avoid maple tree re-walks. * Added new test for the case of prev, next both with vm_ops->close() adding a new VMA, which should result in prev being expanded but NOT merged with next. * Adjusted test code to do a mock version of anon_vma duplication, and cleanup after itself. * Adjusted test code to allow vma preallocation to fail so we can test how we handle this. * Added a test to assert correct anon_vma duplication behaviour. * Added a test to assert that preallocation failure results in anon_vma's being unlinked. * Corrected vma_expand() assumption - we need vma, next not prev. * Reinstated removed VM_WARN_ON() around vp.anon_vma state in commit_merge(). * Rebased over Pedro + Liam's changes. * Updated test logic to handle current->{mm,pid,comm} fields after rebase on Liam's changes which use these. Also added stub for pr_warn_once() for the same reason. * Adjusted logic fundamentals based on rebase - vma_merge_new_range() now assumes vmi is pointing at the gap... https://lore.kernel.org/all/cover.1724441678.git.lorenzo.stoakes@xxxxxxxxxx/ v1: https://lore.kernel.org/linux-mm/cover.1722849859.git.lorenzo.stoakes@xxxxxxxxxx/ Lorenzo Stoakes (10): tools: improve vma test Makefile tools: add VMA merge tests mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify() mm: remove duplicated open-coded VMA policy check mm: abstract vma_expand() to use vma_merge_struct mm: avoid using vma_merge() for new VMAs mm: make vma_prepare() and friends static and internal to vma.c mm: introduce commit_merge(), abstracting final commit of merge mm: refactor vma_merge() into modify-only vma_merge_existing_range() mm: rework vm_ops->close() handling on VMA merge mm/mmap.c | 103 +-- mm/vma.c | 1307 ++++++++++++++++------------ mm/vma.h | 179 ++-- tools/testing/vma/Makefile | 6 +- tools/testing/vma/vma.c | 1366 +++++++++++++++++++++++++++++- tools/testing/vma/vma_internal.h | 51 +- 6 files changed, 2316 insertions(+), 696 deletions(-) -- 2.46.0