REVIEWERS NOTES: This relies upon the series sent in https://lore.kernel.org/all/cover.1742245056.git.lorenzo.stoakes@xxxxxxxxxx/ and must be rebased against this. This RFC has been thoroughly tested but could do with a little further stress testing, especially under heavy reclaim load, so consider this a fairly early version, uploaded in advance of my LSF/MM/BPF topic in relation to this work :) ~~ A long standing issue with VMA merging of anonymous VMAs is the requirement to maintain both vma->vm_pgoff and anon_vma compatibility between merge candidates. For anonymous mappings, vma->vm_pgoff (and consequently, folio->index) refer to virtual page offsets, that is, va >> PAGE_SHIFT. However upon mremap() of an anonymous mapping that has been faulted (that is, where vma->anon_vma != NULL), we would then need to walk page tables to be able to access let alone manipulate folio->index, mapping fields to permit an update of this virtual page offset. Therefore in these instances, we do not do so, instead retaining the virtual page offset the VMA was first faulted in at as it's vma->vm_pgoff field, and of course consequently folio->index. On each occasion we use linear_page_index() to determine the appropriate offset, cleverly offset the vma->vm_pgoff field by the difference between the virtual address and actual VMA start. Doing so in effect fragments the virtual address space, meaning that we are no longer able to merge these VMAs with adjacent ones that could, at least theoretically, be merged. This also creates a difference in behaviour, often surprising to users, between mappings which are faulted and those which are not - as for the latter we adjust vma->vm_pgoff upon mremap() to aid mergeability. This is problematic firstly because this proliferates kernel allocations that are pure memory pressure - unreclaimable and unmovable - i.e. vm_area_struct, anon_vma, anon_vma_chain objects that need not exist. Secondly, mremap() exhibits an implicit uAPI in that it does not permit remaps which span multiple VMAs (though it does permit remaps that constitute a part of a single VMA). This means that a user must concern themselves with whether merges succeed or not should they wish to use mremap() in such a way which causes multiple mremap() calls to be performed upon mappings. This series provides users with an option to accept the overhead of actually updating the VMA and underlying folios via the MREMAP_RELOCATE_ANON flag. If MREMAP_RELOCATE_ANON is specified, but an ordinary merge would result in the mremap() succeeding, then no attempt is made at relocation of folios as this is not required. Even if no merge is possible upon moving of the region, vma->vm_pgoff and folio->index fields are appropriately updated in order that subsequent mremap() or mprotect() calls will succeed in merging. This flag falls back to the ordinary means of mremap() should the operation not be feasible. It also transparently undoes the operation, carefully holding rmap locks such that no racing rmap operation encounters incorrect or missing VMAs. In addition, the MREMAP_MUST_RELOCATE_ANON flag is supplied in case the user needs to know whether or not the operation succeeded - this flag is identical to MREMAP_RELOCATE_ANON, only if the operation cannot succeed, the mremap() fails with -EFAULT. Note that no-op mremap() operations (such as an unpopulated range, or a merge that would trivially succeed already) will succeed under MREMAP_MUST_RELOCATE_ANON. mremap() already walks page tables, so it isn't an order of magntitude increase in workload, but constitutes the need to walk to page table leaf level and manipulate folios. The operations all succeed under THP and in general are compatible with underlying large folios of any size. In fact, the larger the folio, the more efficient the operation is. Performance testing indicate that time taken using MREMAP_RELOCATE_ANON is on the same order of magnitude of ordinary mremap() operations, with both exhibiting time to the proportion of the mapping which is populated. Of course, mremap() operations that are entirely aligned are significantly faster as they need only move a VMA and a smaller number of higher order page tables, but this is unavoidable. Lorenzo Stoakes (7): mm/mremap: introduce more mergeable mremap via MREMAP_RELOCATE_ANON mm/mremap: add MREMAP_MUST_RELOCATE_ANON mm/mremap: add MREMAP[_MUST]_RELOCATE_ANON support for THP folios tools UAPI: Update copy of linux/mman.h from the kernel sources tools/testing/selftests: add mremap() cases that merge normally tools/testing/selftests: add MREMAP_RELOCATE_ANON merge test cases tools/testing/selftests: expand mremap() tests for MREMAP_RELOCATE_ANON include/uapi/linux/mman.h | 8 +- mm/internal.h | 1 + mm/mremap.c | 610 +++++++++- mm/vma.c | 29 +- mm/vma.h | 5 +- tools/include/uapi/linux/mman.h | 8 +- tools/testing/selftests/mm/merge.c | 1338 +++++++++++++++++++++- tools/testing/selftests/mm/mremap_test.c | 270 +++-- tools/testing/vma/vma.c | 5 +- 9 files changed, 2128 insertions(+), 146 deletions(-) -- 2.48.1