On Tue, May 03, 2022, Ben Gardon wrote: > There's currently no test coverage of NX hugepages in KVM selftests, so > add a basic test to ensure that the feature works as intended. Please describe how the test actually validates the feature, here and/or as a file comment. Doesn't have to be super detailed, but people shouldn't have to read the code just to understand that the test uses stats to verify the KVM is (or isn't) splitting pages as expected. > Reviewed-by: David Matlack <dmatlack@xxxxxxxxxx> > Signed-off-by: Ben Gardon <bgardon@xxxxxxxxxx> > --- > tools/testing/selftests/kvm/Makefile | 10 + > .../selftests/kvm/include/kvm_util_base.h | 1 + > tools/testing/selftests/kvm/lib/kvm_util.c | 80 ++++++++ > .../selftests/kvm/x86_64/nx_huge_pages_test.c | 180 ++++++++++++++++++ > .../kvm/x86_64/nx_huge_pages_test.sh | 36 ++++ Test needs to be added to .gitignore. > 5 files changed, 307 insertions(+) > create mode 100644 tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c > create mode 100755 tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh ... > + * Input Args: > + * vm - the VM for which the stat should be read > + * stat_name - the name of the stat to read > + * max_elements - the maximum number of 8-byte values to read into data > + * > + * Output Args: > + * data - the buffer into which stat data should be read > + * > + * Return: > + * The number of data elements read into data or -ERRNO on error. > + * > + * Read the data values of a specified stat from the binary stats interface. > + */ > +static int __vm_get_stat(struct kvm_vm *vm, const char *stat_name, > + uint64_t *data, ssize_t max_elements) > +{ > + struct kvm_stats_desc *stats_desc; > + struct kvm_stats_header header; > + struct kvm_stats_desc *desc; > + size_t size_desc; > + int stats_fd; > + int ret = -EINVAL; > + int i; > + > + stats_fd = vm_get_stats_fd(vm); > + > + read_stats_header(stats_fd, &header); > + > + stats_desc = read_stats_desc(stats_fd, &header); > + > + size_desc = sizeof(struct kvm_stats_desc) + header.name_size; > + > + /* Read kvm stats data one by one */ Bad copy+paste. This doesn't read every stat, it reads only the specified stat. > + for (i = 0; i < header.num_desc; ++i) { > + desc = (void *)stats_desc + (i * size_desc); desc = get_stats_descriptor(vm->stats_desc, i, vm->stats_header); > + > + if (strcmp(desc->name, stat_name)) > + continue; > + > + ret = read_stat_data(stats_fd, &header, desc, data, > + max_elements); > + > + break; > + } > + > + free(stats_desc); > + close(stats_fd); > + return ret; > +} > + > +/* > + * Read the value of the named stat > + * > + * Input Args: > + * vm - the VM for which the stat should be read > + * stat_name - the name of the stat to read > + * > + * Output Args: None > + * > + * Return: > + * The value of the stat > + * > + * Reads the value of the named stat through the binary stat interface. If > + * the named stat has multiple data elements, only the first will be returned. > + */ > +uint64_t vm_get_stat(struct kvm_vm *vm, const char *stat_name) One read_stat_data() doesn't return an int, this can be inlined. Yeah, it'll expose __vm_get_stat(), but IMO there's no point in having __vm_get_stat() unless it's exposed. At that point, drop the function comment and defer to the inner helper for that detailed info (or even drop it entirely). > +{ > + uint64_t data; > + int ret; > + > + ret = __vm_get_stat(vm, stat_name, &data, 1); > + TEST_ASSERT(ret == 1, > + "Stat %s expected to have 1 element, but %d returned", > + stat_name, ret); > + return data; > +} > diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c > new file mode 100644 > index 000000000000..238a6047791c > --- /dev/null > +++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c > @@ -0,0 +1,180 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * tools/testing/selftests/kvm/nx_huge_page_test.c > + * > + * Usage: to be run via nx_huge_page_test.sh, which does the necessary > + * environment setup and teardown > + * > + * Copyright (C) 2022, Google LLC. > + */ > + > +#define _GNU_SOURCE > + > +#include <fcntl.h> > +#include <stdint.h> > +#include <time.h> > + > +#include <test_util.h> > +#include "kvm_util.h" > + > +#define HPAGE_SLOT 10 > +#define HPAGE_GVA (23*1024*1024) > +#define HPAGE_GPA (10*1024*1024) Is there any special meaning behind the addresses? If not, IMO it's safer to define the GPA to be 4gb, that way there's no chance of this colliding with memslot0 created by the framework. 10mb (if my math isn't terrible) isn't all that high. And unless the GPA and GVA need to be different for some reason, just identity map them. > +#define HPAGE_SLOT_NPAGES (512 * 3) > +#define PAGE_SIZE 4096 commit e852be8b148e ("kvm: selftests: introduce and use more page size-related constants") in kvm/master defines PAGE_SIZE. I bring that up, because rather than open code the "512" math in multiple places, I would much rather do something like #define PAGE_SIZE_2MB (PAGE_SIZE * 512) or whatever, and then use that. > + > +/* > + * Passed by nx_huge_pages_test.sh to provide an easy warning if this test is > + * being run without it. > + */ > +#define MAGIC_TOKEN 887563923 > + > +/* > + * x86 opcode for the return instruction. Used to call into, and then > + * immediately return from, memory backed with hugepages. > + */ > +#define RETURN_OPCODE 0xC3 > + > +/* > + * Exit the VM after each memory access so that the userspace component of the > + * test can make assertions about the pages backing the VM. > + * > + * See the below for an explanation of how each access should affect the > + * backing mappings. > + */ > +void guest_code(void) > +{ > + uint64_t hpage_1 = HPAGE_GVA; > + uint64_t hpage_2 = hpage_1 + (PAGE_SIZE * 512); > + uint64_t hpage_3 = hpage_2 + (PAGE_SIZE * 512); > + > + READ_ONCE(*(uint64_t *)hpage_1); > + GUEST_SYNC(1); > + > + READ_ONCE(*(uint64_t *)hpage_2); > + GUEST_SYNC(2); > + > + ((void (*)(void)) hpage_1)(); LOL, nice. It'd be very, very helpful for readers to add a helper for this, e.g. static guest_do_CALL(void *target) { ((void (*)(void)) target)(); } and then the usage should be a little more obvious. guest_do_CALL(hpage_1); > + GUEST_SYNC(3); > + > + ((void (*)(void)) hpage_3)(); > + GUEST_SYNC(4); > + > + READ_ONCE(*(uint64_t *)hpage_1); > + GUEST_SYNC(5); > + > + READ_ONCE(*(uint64_t *)hpage_3); > + GUEST_SYNC(6); > +} ... > +int main(int argc, char **argv) > +{ > + struct kvm_vm *vm; > + struct timespec ts; > + void *hva; > + > + if (argc != 2 || strtol(argv[1], NULL, 0) != MAGIC_TOKEN) { Since this will take multiple params, I think it makes senes to skip on: if (argc < 2 || strtol(argv[1], NULL, 0) != MAGIC_TOKEN) { And the error out on the remaining params. > + printf("This test must be run through nx_huge_pages_test.sh"); Needs a newline at the end. Even better, just use print_skip(). And I strongly prefer that the skip message complain about not getting the correct magic token, not about the script. It's totally valid to run the test without the script. I think it's a good idea to provide a redirect to the script, but yell about the magic token first. > + return KSFT_SKIP; > + } ... > + hva = addr_gpa2hva(vm, HPAGE_GPA); > + memset(hva, RETURN_OPCODE, HPAGE_SLOT_NPAGES * PAGE_SIZE); Heh, no need to set the entire page, only the first byte needs to be written. If you're going for paranoia, write 0xcc to the entire slot and then write only the first byte to RET. > + /* > + * Give recovery thread time to run. The wrapper script sets > + * recovery_period_ms to 100, so wait 5x that. > + */ > + ts.tv_sec = 0; > + ts.tv_nsec = 500000000; > + nanosleep(&ts, NULL); Please pass in the configured nx_huge_pages_recovery_period_ms, or alternatively read it from within the test. I assume the test will fail if the period is too low, so some sanity checks are likely in order. It'll be unfortunate if someone changes the script and forgets to update the test. > + > + /* > + * Now that the reclaimer has run, all the split pages should be gone. > + */ > + check_2m_page_count(vm, 1); > + check_split_count(vm, 0); > + > + /* > + * The 4k mapping on hpage 3 should have been removed, so check that > + * reading from it causes a huge page mapping to be installed. > + */ > + vcpu_run(vm, 0); > + check_2m_page_count(vm, 2); > + check_split_count(vm, 0); > + > + kvm_vm_free(vm); > + > + return 0; > +} > + > diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh > new file mode 100755 > index 000000000000..60bfed8181b9 > --- /dev/null > +++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh > @@ -0,0 +1,36 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-2.0-only */ > +# > +# Wrapper script which performs setup and cleanup for nx_huge_pages_test. > +# Makes use of root privileges to set up huge pages and KVM module parameters. > +# > +# tools/testing/selftests/kvm/nx_huge_page_test.sh > +# Copyright (C) 2022, Google LLC. > + > +set -e > + > +NX_HUGE_PAGES=$(sudo cat /sys/module/kvm/parameters/nx_huge_pages) > +NX_HUGE_PAGES_RECOVERY_RATIO=$(sudo cat /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio) > +NX_HUGE_PAGES_RECOVERY_PERIOD=$(sudo cat /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms) > +HUGE_PAGES=$(sudo cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages) I think we can omit sudo on these. Since we're assuming sysfs is mounted at the normal path, we can also likely assume it's mounted with normal permissions too, i.e. read-only for non-root users. > + > +set +e > + > +( > + set -e > + > + sudo echo 1 > /sys/module/kvm/parameters/nx_huge_pages "sudo echo" doesn't work, the redirection is done by the shell, not echo itself (which is run with sudo). This needs to be e.g. echo 1 | sudo tee -a /sys/module/kvm/parameters/nx_huge_pages > /dev/null > + sudo echo 1 > /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio > + sudo echo 100 > /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms > + sudo echo 3 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages So, what happens if the user is running something else that needs huge pages? Feels like the script should read the value and add three, not just blindly write '3'. > + > + "$(dirname $0)"/nx_huge_pages_test 887563923 > +) > +RET=$? > + > +sudo echo $NX_HUGE_PAGES > /sys/module/kvm/parameters/nx_huge_pages > +sudo echo $NX_HUGE_PAGES_RECOVERY_RATIO > /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio > +sudo echo $NX_HUGE_PAGES_RECOVERY_PERIOD > /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms > +sudo echo $HUGE_PAGES > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages > + > +exit $RET > -- > 2.36.0.464.gb9c8b46e94-goog >