On Sun, Nov 29, 2020 at 1:20 PM Topi Miettinen <toiwoton@xxxxxxxxx> wrote: > > Writing a new value of 3 to /proc/sys/kernel/randomize_va_space > enables full randomization of memory mappings created with mmap(NULL, > ...). With 2, the base of the VMA used for such mappings is random, > but the mappings are created in predictable places within the VMA and > in sequential order. With 3, new VMAs are created to fully randomize > the mappings. > > Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if not > necessary and the location of stack and vdso are also randomized. > > The method is to randomize the new address without considering > VMAs. If the address fails checks because of overlap with the stack > area (or in case of mremap(), overlap with the old mapping), the > operation is retried a few times before falling back to old method. > > On 32 bit systems this may cause problems due to increased VM > fragmentation if the address space gets crowded. > > On all systems, it will reduce performance and increase memory usage > due to less efficient use of page tables and inability to merge > adjacent VMAs with compatible attributes. In the worst case, > additional page table entries of up to 4 pages are created for each > mapping, so with small mappings there's considerable penalty. > > In this example with sysctl.kernel.randomize_va_space = 2, dynamic > loader, libc, anonymous memory reserved with mmap() and locale-archive > are located close to each other: > > $ cat /proc/self/maps (only first line for each object shown for brevity) > 5acea452d000-5acea452f000 r--p 00000000 fe:0c 1868624 /usr/bin/cat > 74f438f90000-74f4394f2000 r--p 00000000 fe:0c 2473999 /usr/lib/locale/locale-archive > 74f4394f2000-74f4395f2000 rw-p 00000000 00:00 0 > 74f4395f2000-74f439617000 r--p 00000000 fe:0c 2402332 /usr/lib/x86_64-linux-gnu/libc-2.31.so > 74f4397b3000-74f4397b9000 rw-p 00000000 00:00 0 > 74f4397e5000-74f4397e6000 r--p 00000000 fe:0c 2400754 /usr/lib/x86_64-linux-gnu/ld-2.31.so > 74f439811000-74f439812000 rw-p 00000000 00:00 0 > 7fffdca0d000-7fffdca2e000 rw-p 00000000 00:00 0 [stack] > 7fffdcb49000-7fffdcb4d000 r--p 00000000 00:00 0 [vvar] > 7fffdcb4d000-7fffdcb4f000 r-xp 00000000 00:00 0 [vdso] > > With sysctl.kernel.randomize_va_space = 3, they are located at > unrelated addresses and the order is random: > > $ echo 3 > /proc/sys/kernel/randomize_va_space > $ cat /proc/self/maps (only first line for each object shown for brevity) > 3850520000-3850620000 rw-p 00000000 00:00 0 > 28cfb4c8000-28cfb4cc000 r--p 00000000 00:00 0 [vvar] > 28cfb4cc000-28cfb4ce000 r-xp 00000000 00:00 0 [vdso] > 9e74c385000-9e74c387000 rw-p 00000000 00:00 0 > a42e0233000-a42e0234000 r--p 00000000 fe:0c 2400754 /usr/lib/x86_64-linux-gnu/ld-2.31.so > a42e025f000-a42e0260000 rw-p 00000000 00:00 0 > bea40427000-bea4044c000 r--p 00000000 fe:0c 2402332 /usr/lib/x86_64-linux-gnu/libc-2.31.so > bea405e8000-bea405ec000 rw-p 00000000 00:00 0 > f6d446fa000-f6d44c5c000 r--p 00000000 fe:0c 2473999 /usr/lib/locale/locale-archive > fcfbf684000-fcfbf6a5000 rw-p 00000000 00:00 0 [stack] > 619aba62d000-619aba62f000 r--p 00000000 fe:0c 1868624 /usr/bin/cat > > CC: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > CC: Jann Horn <jannh@xxxxxxxxxx> > CC: Kees Cook <keescook@xxxxxxxxxxxx> > CC: Matthew Wilcox <willy@xxxxxxxxxxxxx> > CC: Mike Rapoport <rppt@xxxxxxxxxx> > CC: Linux API <linux-api@xxxxxxxxxxxxxxx> > Signed-off-by: Topi Miettinen <toiwoton@xxxxxxxxx> > --- > v2: also randomize mremap(..., MREMAP_MAYMOVE) > v3: avoid stack area and retry in case of bad random address (Jann > Horn), improve description in kernel.rst (Matthew Wilcox) > v4: > - use /proc/$pid/maps in the example (Mike Rapaport) > - CCs (Andrew Morton) > - only check randomize_va_space == 3 > v5: randomize also vdso and stack > --- > Documentation/admin-guide/hw-vuln/spectre.rst | 6 ++-- > Documentation/admin-guide/sysctl/kernel.rst | 20 +++++++++++++ > arch/x86/entry/vdso/vma.c | 26 +++++++++++++++- > include/linux/mm.h | 8 +++++ > init/Kconfig | 2 +- > mm/mmap.c | 30 +++++++++++++------ > mm/mremap.c | 27 +++++++++++++++++ > mm/util.c | 6 ++++ > 8 files changed, 111 insertions(+), 14 deletions(-) > > diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst > index e05e581af5cf..9ea250522077 100644 > --- a/Documentation/admin-guide/hw-vuln/spectre.rst > +++ b/Documentation/admin-guide/hw-vuln/spectre.rst > @@ -254,7 +254,7 @@ Spectre variant 2 > left by the previous process will also be cleared. > > User programs should use address space randomization to make attacks > - more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2). > + more difficult (Set /proc/sys/kernel/randomize_va_space = 1, 2 or 3). > > 3. A virtualized guest attacking the host > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > @@ -499,8 +499,8 @@ Spectre variant 2 > more overhead and run slower. > > User programs should use address space randomization > - (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more > - difficult. > + (/proc/sys/kernel/randomize_va_space = 1, 2 or 3) to make attacks > + more difficult. > > 3. VM mitigation > ^^^^^^^^^^^^^^^^ > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst > index d4b32cc32bb7..806e3b29d2b5 100644 > --- a/Documentation/admin-guide/sysctl/kernel.rst > +++ b/Documentation/admin-guide/sysctl/kernel.rst > @@ -1060,6 +1060,26 @@ that support this feature. > Systems with ancient and/or broken binaries should be configured > with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process > address space randomization. > + > +3 Additionally enable full randomization of memory mappings created > + with mmap(NULL, ...). With 2, the base of the VMA used for such > + mappings is random, but the mappings are created in predictable > + places within the VMA and in sequential order. With 3, new VMAs > + are created to fully randomize the mappings. > + > + Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if > + not necessary and the location of stack and vdso are also > + randomized. > + > + On 32 bit systems this may cause problems due to increased VM > + fragmentation if the address space gets crowded. > + > + On all systems, it will reduce performance and increase memory > + usage due to less efficient use of page tables and inability to > + merge adjacent VMAs with compatible attributes. In the worst case, > + additional page table entries of up to 4 pages are created for > + each mapping, so with small mappings there's considerable penalty. > + > == =========================================================================== > > > diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c > index 9185cb1d13b9..03ea884822e3 100644 > --- a/arch/x86/entry/vdso/vma.c > +++ b/arch/x86/entry/vdso/vma.c > @@ -12,6 +12,7 @@ > #include <linux/init.h> > #include <linux/random.h> > #include <linux/elf.h> > +#include <linux/elf-randomize.h> > #include <linux/cpu.h> > #include <linux/ptrace.h> > #include <linux/time_namespace.h> > @@ -32,6 +33,8 @@ > const size_t name ## _offset = offset; > #include <asm/vvar.h> > > +#define MAX_RANDOM_VDSO_RETRIES 5 > + > struct vdso_data *arch_get_vdso_data(void *vvar_page) > { > return (struct vdso_data *)(vvar_page + _vdso_data_offset); > @@ -361,7 +364,28 @@ static unsigned long vdso_addr(unsigned long start, unsigned len) > > static int map_vdso_randomized(const struct vdso_image *image) > { > - unsigned long addr = vdso_addr(current->mm->start_stack, image->size-image->sym_vvar_start); > + unsigned long addr; > + > + if (randomize_va_space == 3) { > + /* > + * Randomize vdso address. > + */ > + int i = MAX_RANDOM_VDSO_RETRIES; > + > + do { > + int ret; > + > + /* Try a few times to find a free area */ > + addr = arch_mmap_rnd(); > + > + ret = map_vdso(image, addr); > + if (!IS_ERR_VALUE(ret)) > + return ret; > + } while (--i >= 0); > + > + /* Give up and try the less random way */ > + } > + addr = vdso_addr(current->mm->start_stack, image->size-image->sym_vvar_start); This is IMO rather ugly. You're picking random numbers and throwing them at map_vdso(), which throws them at get_unmapped_area(), which will validate them. And you duplicate the same ugly loop later on. How about instead pushing this logic into get_unmapped_area()? --Andy