[RFC PATCH 0/2] mm: introduce anon_vma flags, reduce kernel allocs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



VMA resources are scarce. This is a data structure whose weight we wish to
reduce (certainly as slab allocations are unreclaimable and - for now -
unmigratable).

So adding additional fields is generally unviable, and VMA flags are
equally as contended, and prevent VMA merge, further impacting overhead.

We can however make use of the time-honoured kernel tradition of grabbing
bits where we can.

Since we can rely upon anon_vma allocations being at least system
word-aligned, we have a handful of bits in the vma->anon_vma available to
use as flags.

In this series we establish doing so, and immediately use this to solve a
problem encountered as part of the guard region feature
(MADV_GUARD_INSTALL, MADV_GUARD_REMOVE).

We absolutely must preserve guard regions over fork, however it turns out
the only reasonable means of doing so is to establish an anon_vma even if
the VMA is unfaulted.

This creates unnecessary overhead, a problem extenuated by the extension of
this functionality to file-backed regions, where such-allocated memory may
never be utilised or freed until the end of the VMA's lifetime.

We can avoid this if we have a means of indicating to fork that we wish to
copy page tables without having to have this overhead.

Having flags available in vma->anon_vma allows us to do so - we can
therefore introduce a flag, ANON_VMA_UNFAULTED, which indicates that this
is the case.

We introduce wrapper functions to mask off these bits, and nearly every
part of the kernel behaves precisely the same as a result, with only the
desired change in behaviour in the forking logic.

On fault, or any operation that actually requires an established anon_vma,
the ANON_VMA_UNFAULTED flag is cleared and replaced by an actual anon_vma.

An additional advantage of having this mechanism is that we can also remove
this flag, should no 'real' anon_vma be established, and the user is
executing MADV_GUARD_REMOVE on the whole VMA, meaning we can prevent future
unneeded page table operations.

A benefit of this change, aside from saving kernel memory allocations, is
that THP page collapse is no longer impacted if we apply guard regions then
remove them in their entirety from a VMA, as otherwise the immediate
collapse of aligned page tables in retract_page_tables() cannot proceed.

Lorenzo Stoakes (2):
  mm: introduce anon_vma flags and use wrapper functions
  mm/madvise: utilise anon_vma unfaulted flag on guard region install

 fs/coredump.c                    |  2 +-
 include/linux/mm_types.h         | 67 ++++++++++++++++++++-
 include/linux/rmap.h             |  4 +-
 kernel/fork.c                    |  4 +-
 mm/debug.c                       |  6 +-
 mm/huge_memory.c                 |  4 +-
 mm/khugepaged.c                  | 12 ++--
 mm/ksm.c                         | 16 +++---
 mm/madvise.c                     | 49 ++++++++++------
 mm/memory.c                      |  6 +-
 mm/mmap.c                        |  2 +-
 mm/mprotect.c                    |  2 +-
 mm/mremap.c                      |  8 +--
 mm/rmap.c                        | 42 +++++++-------
 mm/swapfile.c                    |  2 +-
 mm/userfaultfd.c                 |  2 +-
 mm/vma.c                         | 99 +++++++++++++++++++++++++-------
 mm/vma.h                         |  6 +-
 security/selinux/hooks.c         |  2 +-
 tools/testing/vma/vma.c          | 95 +++++++++++++++---------------
 tools/testing/vma/vma_internal.h | 78 ++++++++++++++++++++++---
 21 files changed, 358 insertions(+), 150 deletions(-)

--
2.48.1




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux