It is now possible to walk the vma tree using the rcu read locks and is beneficial to do so to reduce lock contention. Doing so while a MAP_FIXED mapping is executing means that a reader may see a gap in the vma tree that should never logically exist - and does not when using the mmap lock in read mode. The temporal gap exists because mmap_region() calls munmap() prior to installing the new mapping. This patch set stops rcu readers from seeing the temporal gap by splitting up the munmap() function into two parts. The first part prepares the vma tree for modifications by doing the necessary splits and tracks the vmas marked for removal in a side tree. The second part completes the munmapping of the vmas after the vma tree has been overwritten (either by a MAP_FIXED replacement vma or by a NULL in the munmap() case). Please note that rcu walkers will still be able to see a temporary state of split vmas that may be in the process of being removed, but the temporal gap will not be exposed. vma_start_write() are called on both parts of the split vma, so this state is detectable. If existing vmas have a vm_ops->close(), then they will be called prior to mapping the new vmas (and ptes are cleared out). Without calling ->close(), hugetlbfs tests fail (hugemmap06 specifically) due to resources still being marked as 'busy'. Unfortunately, calling the corresponding ->open() may not restore the state of the vmas, so it is safer to keep the existing failure scenario where a gap is inserted and never replaced. The failure scenario is in its own patch (0015) for traceability. RFC: https://lore.kernel.org/linux-mm/20240531163217.1584450-1-Liam.Howlett@xxxxxxxxxx/ v1: https://lore.kernel.org/linux-mm/20240611180200.711239-1-Liam.Howlett@xxxxxxxxxx/ v2: https://lore.kernel.org/all/20240625191145.3382793-1-Liam.Howlett@xxxxxxxxxx/ v3: https://lore.kernel.org/linux-mm/20240704182718.2653918-1-Liam.Howlett@xxxxxxxxxx/ v4: https://lore.kernel.org/linux-mm/20240710192250.4114783-1-Liam.Howlett@xxxxxxxxxx/ v5: https://lore.kernel.org/linux-mm/20240717200709.1552558-1-Liam.Howlett@xxxxxxxxxx/ Changes since v5: - rebase on akpm/mm-unstable + mseal patches by Pedro - The rebase means that almost all of these changes had to be modified to change mm/vma.c and mm/vma.h. - Removed the arch_unmap() changes as the call is no longer in mm-unstable - Dropped mseal changes in favour of using Pedro's mseal changes. These patches conflict heavily in munmap(), so I can fix this up depending on the solution for mseal(), if needed. - Added a patch to create the gap if call_mmap() fails and vmas were closed (patch 15) - vms_complete_munmap_vmas() now checks if the lock should be downgraded regardless of if there is a vma or not. The side effect is that the vma_munmap_struct must always set the mm. Liam R. Howlett (20): mm/vma: Correctly position vma_iterator in __split_vma() mm/vma: Introduce abort_munmap_vmas() mm/vma: Introduce vmi_complete_munmap_vmas() mm/vma: Extract the gathering of vmas from do_vmi_align_munmap() mm/vma: Introduce vma_munmap_struct for use in munmap operations mm/vma: Change munmap to use vma_munmap_struct() for accounting and surrounding vmas mm/vma: Extract validate_mm() from vma_complete() mm/vma: Inline munmap operation in mmap_region() mm/vma: Expand mmap_region() munmap call mm/vma: Support vma == NULL in init_vma_munmap() mm/mmap: Reposition vma iterator in mmap_region() mm/vma: Track start and end for munmap in vma_munmap_struct mm: Clean up unmap_region() argument list mm/mmap: Avoid zeroing vma tree in mmap_region() mm: Change failure of MAP_FIXED to restoring the gap on failure mm/mmap: Use PHYS_PFN in mmap_region() mm/mmap: Use vms accounted pages in mmap_region() ipc/shm, mm: Drop do_vma_munmap() mm: Move may_expand_vm() check in mmap_region() mm/vma: Drop incorrect comment from vms_gather_munmap_vmas() include/linux/mm.h | 6 +- ipc/shm.c | 8 +- mm/mmap.c | 138 +++++++++--------- mm/vma.c | 355 +++++++++++++++++++++++++++------------------ mm/vma.h | 153 ++++++++++++++++--- 5 files changed, 415 insertions(+), 245 deletions(-) -- 2.43.0