VMA resources are scarce. This is a data structure whose weight we wish to reduce (certainly as slab allocations are unreclaimable and - for now - unmigratable). So adding additional fields is generally unviable, and VMA flags are equally as contended, and prevent VMA merge, further impacting overhead. We can however make use of the time-honoured kernel tradition of grabbing bits where we can. Since we can rely upon anon_vma allocations being at least system word-aligned, we have a handful of bits in the vma->anon_vma available to use as flags. In this series we establish doing so, and immediately use this to solve a problem encountered as part of the guard region feature (MADV_GUARD_INSTALL, MADV_GUARD_REMOVE). We absolutely must preserve guard regions over fork, however it turns out the only reasonable means of doing so is to establish an anon_vma even if the VMA is unfaulted. This creates unnecessary overhead, a problem extenuated by the extension of this functionality to file-backed regions, where such-allocated memory may never be utilised or freed until the end of the VMA's lifetime. We can avoid this if we have a means of indicating to fork that we wish to copy page tables without having to have this overhead. Having flags available in vma->anon_vma allows us to do so - we can therefore introduce a flag, ANON_VMA_UNFAULTED, which indicates that this is the case. We introduce wrapper functions to mask off these bits, and nearly every part of the kernel behaves precisely the same as a result, with only the desired change in behaviour in the forking logic. On fault, or any operation that actually requires an established anon_vma, the ANON_VMA_UNFAULTED flag is cleared and replaced by an actual anon_vma. An additional advantage of having this mechanism is that we can also remove this flag, should no 'real' anon_vma be established, and the user is executing MADV_GUARD_REMOVE on the whole VMA, meaning we can prevent future unneeded page table operations. A benefit of this change, aside from saving kernel memory allocations, is that THP page collapse is no longer impacted if we apply guard regions then remove them in their entirety from a VMA, as otherwise the immediate collapse of aligned page tables in retract_page_tables() cannot proceed. Lorenzo Stoakes (2): mm: introduce anon_vma flags and use wrapper functions mm/madvise: utilise anon_vma unfaulted flag on guard region install fs/coredump.c | 2 +- include/linux/mm_types.h | 67 ++++++++++++++++++++- include/linux/rmap.h | 4 +- kernel/fork.c | 4 +- mm/debug.c | 6 +- mm/huge_memory.c | 4 +- mm/khugepaged.c | 12 ++-- mm/ksm.c | 16 +++--- mm/madvise.c | 49 ++++++++++------ mm/memory.c | 6 +- mm/mmap.c | 2 +- mm/mprotect.c | 2 +- mm/mremap.c | 8 +-- mm/rmap.c | 42 +++++++------- mm/swapfile.c | 2 +- mm/userfaultfd.c | 2 +- mm/vma.c | 99 +++++++++++++++++++++++++------- mm/vma.h | 6 +- security/selinux/hooks.c | 2 +- tools/testing/vma/vma.c | 95 +++++++++++++++--------------- tools/testing/vma/vma_internal.h | 78 ++++++++++++++++++++++--- 21 files changed, 358 insertions(+), 150 deletions(-) -- 2.48.1