On Wed, Nov 15, 2017 at 03:10:42PM +0300, Kirill A. Shutemov wrote: > On Wed, Nov 15, 2017 at 12:39:40PM +0100, Thomas Gleixner wrote: > > On Wed, 15 Nov 2017, Kirill A. Shutemov wrote: > > > On Wed, Nov 15, 2017 at 12:00:46AM +0100, Thomas Gleixner wrote: > > > > On Wed, 15 Nov 2017, Kirill A. Shutemov wrote: > > > > > On Tue, Nov 14, 2017 at 09:54:52PM +0100, Thomas Gleixner wrote: > > > > > > On Tue, 14 Nov 2017, Kirill A. Shutemov wrote: > > > > > > > > > > > > > On Tue, Nov 14, 2017 at 05:01:50PM +0100, Thomas Gleixner wrote: > > > > > > > > @@ -198,11 +199,14 @@ arch_get_unmapped_area_topdown(struct fi > > > > > > > > /* requesting a specific address */ > > > > > > > > if (addr) { > > > > > > > > addr = PAGE_ALIGN(addr); > > > > > > > > + if (!mmap_address_hint_valid(addr, len)) > > > > > > > > + goto get_unmapped_area; > > > > > > > > + > > > > > > > > > > > > > > Here and in hugetlb_get_unmapped_area(), we should align the addr after > > > > > > > the check, not before. Otherwise the alignment itself can bring us over > > > > > > > the borderline as we align up. > > > > > > > > > > > > Hmm, then I wonder whether the next check against vm_start_gap() which > > > > > > checks against the aligned address is correct: > > > > > > > > > > > > addr = PAGE_ALIGN(addr); > > > > > > vma = find_vma(mm, addr); > > > > > > > > > > > > if (end - len >= addr && > > > > > > (!vma || addr + len <= vm_start_gap(vma))) > > > > > > return addr; > > > > > > > > > > I think the check is correct. The check is against resulting addresses > > > > > that end up in vm_start/vm_end. In our case we want to figure out what > > > > > user asked for. > > > > > > > > Well, but then checking just against the user supplied addr is only half of > > > > the story. > > > > > > > > addr = boundary - PAGE_SIZE - PAGE_SIZE / 2; > > > > len = PAGE_SIZE - PAGE_SIZE / 2; > > > > > > > > That fits, but then after alignment we end up with > > > > > > > > addr = boudary - PAGE_SIZE; > > > > > > > > and due to len > PAGE_SIZE this will result in a mapping which crosses the > > > > boundary, right? So checking against the PAGE_ALIGN(addr) should be the > > > > right thing to do. > > > > > > IIUC, this is only the case if 'len' is not aligned, right? > > > > > > >From what I see we expect caller to align it (and mm/mmap.c does this, I > > > haven't checked other callers). > > > > > > And hugetlb would actively reject non-aligned len. > > > > > > I *think* we should be fine with checking unaligned 'addr'. > > > > I think we should keep it consistent for the normal and the huge case and > > just check aligned and be done with it. > > Aligned 'addr'? Or 'len'? Both? > > We would have problem with checking aligned addr. I steped it in hugetlb > case: > > - User asks for mmap((1UL << 47) - PAGE_SIZE, 2 << 20, MAP_HUGETLB); > > - On 4-level paging machine this gives us invalid hint address as > 'TASK_SIZE - len' is more than 'addr'. Goto get_unmapped_area. > > - On 5-level paging machine hint address gets rounded up to next 2MB > boundary that is exactly 1UL << 47 and we happily allocate from full > address space which may lead to trouble. Below is updated patch with self-test. Output on 5-level paging machine: mmap(NULL): 0x7fbbad1f3000 - OK mmap(LOW_ADDR): 0x40000000 - OK mmap(HIGH_ADDR): 0x4000000000000 - OK mmap(HIGH_ADDR) again: 0xffffbbad1fb000 - OK mmap(HIGH_ADDR, MAP_FIXED): 0x4000000000000 - OK mmap(-1): 0xffffbbad1f9000 - OK mmap(-1) again: 0xffffbbad1f7000 - OK mmap((1UL << 47), 2 * PAGE_SIZE): 0x7fbbad1f3000 - OK mmap((1UL << 47), 2 * PAGE_SIZE / 2): 0x7fbbad1f1000 - OK mmap((1UL << 47) - PAGE_SIZE, 2 * PAGE_SIZE, MAP_FIXED): 0x7ffffffff000 - OK mmap(NULL, MAP_HUGETLB): 0x7fbbac400000 - OK mmap(LOW_ADDR, MAP_HUGETLB): 0x40000000 - OK mmap(HIGH_ADDR, MAP_HUGETLB): 0x4000000000000 - OK mmap(HIGH_ADDR, MAP_HUGETLB) again: 0xffffbbace00000 - OK mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB): 0x4000000000000 - OK mmap(-1, MAP_HUGETLB): (nil) - OK mmap(-1, MAP_HUGETLB) again: 0x7fbbac400000 - OK mmap((1UL << 47), 2UL << 20, MAP_HUGETLB): 0x800000000000 - FAILED mmap((1UL << 47) - (2UL << 20), 4UL << 20, MAP_FIXED | MAP_HUGETLB): 0x7fffffe00000 - OK So, only hugetlb is problematic. mmap() aligns addr to PAGE_SIZE. See round_hint_to_min(). In this case we round address *down* and it works fine. Replacing 'addr = ALIGN(addr, huge_page_size(h))' in hugetlbpage.c with 'addr &= huge_page_mask(h)' fixes the issue. >From 8645d0052b5919ee682a04f705f1668c2b281425 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> Date: Wed, 8 Nov 2017 12:55:32 +0300 Subject: [PATCH] x86/selftests: Add test for mapping placement for 5-level paging With 5-level paging, we have 56-bit virtual address space available for userspace. But we don't want to expose userspace to addresses above 47-bits, unless it asked specifically for it. We use mmap(2) hint address as a way for kernel to know if it's okay to allocate virtual memory above 47-bit. Let's add a self-test that covers few corner cases of the interface. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> --- tools/testing/selftests/x86/5lvl.c | 179 +++++++++++++++++++++++++++++++++++ tools/testing/selftests/x86/Makefile | 2 +- 2 files changed, 180 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/5lvl.c diff --git a/tools/testing/selftests/x86/5lvl.c b/tools/testing/selftests/x86/5lvl.c new file mode 100644 index 000000000000..6c396f0c869d --- /dev/null +++ b/tools/testing/selftests/x86/5lvl.c @@ -0,0 +1,179 @@ +#include <stdio.h> +#include <sys/mman.h> + +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +#define PAGE_SIZE 4096 +#define SIZE (2 * PAGE_SIZE) +#define LOW_ADDR ((void *) (1UL << 30)) +#define HIGH_ADDR ((void *) (1UL << 50)) +#define TASK_SIZE ((void *) (1UL << 47)) + +struct testcase { + void *addr; + unsigned long size; + unsigned long flags; + const char *msg; + unsigned int low_addr_required:1; + unsigned int keep_mapped:1; +}; + +static struct testcase testcases[] = { + { + .addr = NULL, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(NULL)", + .low_addr_required = 1, + }, + { + .addr = LOW_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(LOW_ADDR)", + .low_addr_required = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR)", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR) again", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap(HIGH_ADDR, MAP_FIXED)", + }, + { + .addr = (void*) -1, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1)", + .keep_mapped = 1, + }, + { + .addr = (void*) -1, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1) again", + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE), + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap((1UL << 47), 2 * PAGE_SIZE)", + .low_addr_required = 1, + .keep_mapped = 1, + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE / 2), + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap((1UL << 47), 2 * PAGE_SIZE / 2)", + .low_addr_required = 1, + .keep_mapped = 1, + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE), + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap((1UL << 47) - PAGE_SIZE, 2 * PAGE_SIZE, MAP_FIXED)", + }, + { + .addr = NULL, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(NULL, MAP_HUGETLB)", + .low_addr_required = 1, + }, + { + .addr = LOW_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(LOW_ADDR, MAP_HUGETLB)", + .low_addr_required = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR, MAP_HUGETLB)", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)", + }, + { + .addr = (void*) -1, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1, MAP_HUGETLB)", + .keep_mapped = 1, + }, + { + .addr = (void*) -1, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1, MAP_HUGETLB) again", + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE), + .size = 4UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)", + .low_addr_required = 1, + .keep_mapped = 1, + }, + { + .addr = (void *)((1UL << 47) - (2UL << 20)), + .size = 4UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap((1UL << 47) - (2UL << 20), 4UL << 20, MAP_FIXED | MAP_HUGETLB)", + }, +}; + +int main(int argc, char **argv) +{ + int i; + void *p; + + for (i = 0; i < ARRAY_SIZE(testcases); i++) { + struct testcase *t = testcases + i; + + p = mmap(t->addr, t->size, PROT_NONE, t->flags, -1, 0); + + printf("%s: %p - ", t->msg, p); + + if (p == MAP_FAILED) { + printf("FAILED\n"); + continue; + } + + if (t->low_addr_required && p >= (void *)(1UL << 47)) + printf("FAILED\n"); + else + printf("OK\n"); + if (!t->keep_mapped) + munmap(p, t->size); + } + return 0; +} diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 7b1adeee4b0f..939a337128db 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -11,7 +11,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_sysc TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer -TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip +TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip 5lvl TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY) TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY) -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>