On Fri, Jan 10, 2025 at 4:12 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > This can/should be posted separately, no? MGLRU support for secondary MMUs went > through mm/, the KVM changes in this series are all about making KVM x86 faster. I'll send this patch separately, sure. > > On Tue, Nov 05, 2024, James Houghton wrote: > > This test now has two modes of operation: > > Bad Google3, bad! Write changelogs as commands, i.e. state what the patch is > doing. The above is also misleading (at least, it was to me), as I assumed one > of the modes would be "legacy" and the other would be MGLRU. But it looks like > this patch adds MGLRU support *and* benchmarking. > > This should be split into at least two patches, possibly three (I can't tell how > much pre-work there is). I.e. add MGLRU support, and then add the benchmarking > stuff. And if there's substantial refactoring and/or new utilities, do that first. > > > 1. (default) To check how much vCPU performance was affected by access > > tracking (previously existed, now supports MGLRU aging). > > 2. (-p) To also benchmark how fast MGLRU can do aging while vCPUs are > > faulting in memory. > > > > Mode (1) also serves as a way to verify that aging is working properly > > for pages only accessed by KVM. It will fail if one does not have the > > "one" what? > > > 0x8 lru_gen feature bit. > > > > To support MGLRU, the test creates a memory cgroup, moves itself into > > it, then uses the lru_gen debugfs output to track memory in that cgroup. > > The logic to parse the lru_gen debugfs output has been put into > > selftests/kvm/lib/lru_gen_util.c. > > > > Co-developed-by: Axel Rasmussen <axelrasmussen@xxxxxxxxxx> > > Signed-off-by: Axel Rasmussen <axelrasmussen@xxxxxxxxxx> > > Signed-off-by: James Houghton <jthoughton@xxxxxxxxxx> > > ... > > > @@ -47,6 +48,19 @@ > > #include "memstress.h" > > #include "guest_modes.h" > > #include "processor.h" > > +#include "lru_gen_util.h" > > + > > +static const char *TEST_MEMCG_NAME = "access_tracking_perf_test"; > > +static const int LRU_GEN_ENABLED = 0x1; > > +static const int LRU_GEN_MM_WALK = 0x2; > > Is there really no uAPI #define for these? > > > +static const char *CGROUP_PROCS = "cgroup.procs"; > > +/* > > + * If using MGLRU, this test assumes a cgroup v2 or cgroup v1 memory hierarchy > > I would say "requires", not "assumes". > > > + * is mounted at cgroup_root. > > + * > > + * Can be changed with -r. > > This is amusingly vague. I vote to omit explicitly referencing the command line > option, we have enough problems maintaining them as it is. Instead simply say > something like "Default to the kernel's preferred path for mounting cgroups" to > both explain where the default comes from, and to give the reader a hint that the > path can be changed. > > Actually, there's zero reason for this to be global. More below. > > Ugh, and if this test is manipulating cgroups, won't it need to root? I doubt > y'all have tried to run this outside of devrez, i.e. on a system where you're > not logged in as root. > > Oh, never mind, this test already effectively requires root. > > > + /* Whether to use lru_gen aging instead of idle page tracking. */ > > + bool lru_gen; > > Needs a verb, otherwise this looks like it tracks the LRU generation. E.g. use_lru_gen. > > > + > > + /* Whether to test the performance of aging itself. */ > > + bool benchmark_lru_gen; > > }; > > > > static uint64_t pread_uint64(int fd, const char *filename, uint64_t index) > > @@ -89,6 +112,50 @@ static uint64_t pread_uint64(int fd, const char *filename, uint64_t index) > > > > } > > > > +static void write_file_long(const char *path, long v) > > +{ > > + FILE *f; > > + > > + f = fopen(path, "w"); > > + TEST_ASSERT(f, "fopen(%s) failed", path); > > + TEST_ASSERT(fprintf(f, "%ld\n", v) > 0, > > + "fprintf to %s failed", path); > > + TEST_ASSERT(!fclose(f), "fclose(%s) failed", path); > > +} > > + > > +static char *path_join(const char *parent, const char *child) > > +{ > > + char *out = NULL; > > + > > + return asprintf(&out, "%s/%s", parent, child) >= 0 ? out : NULL; > > +} > > These are common utilities, no? I.e. should be somewhere common, not buried in > this test. > > > + > > +static char *memcg_path(const char *memcg) > > +{ > > + return path_join(cgroup_root, memcg); > > Eh, do the join when cgroup_root is first defined. Actually, looking at the > usage more closely, the cgroup path is only used during main(). Just do all of > the joins there, I see no reason to have these one-off helpers. > > > +} > > + > > +static char *memcg_file_path(const char *memcg, const char *file) > > +{ > > + char *mp = memcg_path(memcg); > > + char *fp; > > + > > + if (!mp) > > + return NULL; > > Returning NULL just so that the one user can assert on !NULL is rather pointless. > > > + fp = path_join(mp, file); > > + free(mp); > > + return fp; > > +} > > + > > +static void move_to_memcg(const char *memcg, pid_t pid) > > +{ > > + char *procs = memcg_file_path(memcg, CGROUP_PROCS); > > + > > + TEST_ASSERT(procs, "Failed to construct cgroup.procs path"); > > + write_file_long(procs, pid); > > + free(procs); > > +} > > + > > #define PAGEMAP_PRESENT (1ULL << 63) > > #define PAGEMAP_PFN_MASK ((1ULL << 55) - 1) > > > > @@ -242,6 +309,8 @@ static void vcpu_thread_main(struct memstress_vcpu_args *vcpu_args) > > }; > > > > vcpu_last_completed_iteration[vcpu_idx] = current_iteration; > > + clock_gettime(CLOCK_MONOTONIC, > > + &vcpu_last_completed_time[vcpu_idx]); > > } > > } > > > > @@ -253,38 +322,68 @@ static void spin_wait_for_vcpu(int vcpu_idx, int target_iteration) > > } > > } > > > > +static bool all_vcpus_done(int target_iteration, int nr_vcpus) > > +{ > > + for (int i = 0; i < nr_vcpus; ++i) > > Preferred style is to declare variables outside of loops. > > Needs curly braces. > > > + if (READ_ONCE(vcpu_last_completed_iteration[i]) != > > + target_iteration) > > Don't wrap. > > > + return false; > > + > > + return true; > > +} > > + > > /* The type of memory accesses to perform in the VM. */ > > enum access_type { > > ACCESS_READ, > > ACCESS_WRITE, > > }; > > > > -static void run_iteration(struct kvm_vm *vm, int nr_vcpus, const char *description) > > +static void run_iteration(struct kvm_vm *vm, int nr_vcpus, const char *description, > > + bool wait) > > { > > - struct timespec ts_start; > > - struct timespec ts_elapsed; > > int next_iteration, i; > > > > /* Kick off the vCPUs by incrementing iteration. */ > > next_iteration = ++iteration; > > > > - clock_gettime(CLOCK_MONOTONIC, &ts_start); > > - > > /* Wait for all vCPUs to finish the iteration. */ > > - for (i = 0; i < nr_vcpus; i++) > > - spin_wait_for_vcpu(i, next_iteration); > > + if (wait) { > > + struct timespec ts_start; > > + struct timespec ts_elapsed; > > + > > + clock_gettime(CLOCK_MONOTONIC, &ts_start); > > > > - ts_elapsed = timespec_elapsed(ts_start); > > - pr_info("%-30s: %ld.%09lds\n", > > - description, ts_elapsed.tv_sec, ts_elapsed.tv_nsec); > > + for (i = 0; i < nr_vcpus; i++) > > + spin_wait_for_vcpu(i, next_iteration); > > + > > + ts_elapsed = timespec_elapsed(ts_start); > > + > > + pr_info("%-30s: %ld.%09lds\n", > > + description, ts_elapsed.tv_sec, ts_elapsed.tv_nsec); > > + } else > > Needs curly braces. > > > + pr_info("%-30s\n", description); > > } > > > > -static void access_memory(struct kvm_vm *vm, int nr_vcpus, > > - enum access_type access, const char *description) > > +static void _access_memory(struct kvm_vm *vm, int nr_vcpus, > > Use double underscores (ignore any precedent in selftests that uses just one, > we're trying to purge that ugliness). > > > + enum access_type access, const char *description, > > + bool wait) > > { > > memstress_set_write_percent(vm, (access == ACCESS_READ) ? 0 : 100); > > iteration_work = ITERATION_ACCESS_MEMORY; > > - run_iteration(vm, nr_vcpus, description); > > + run_iteration(vm, nr_vcpus, description, wait); > > +} > > + > > +static void access_memory(struct kvm_vm *vm, int nr_vcpus, > > + enum access_type access, const char *description) > > +{ > > + return _access_memory(vm, nr_vcpus, access, description, true); > > +} > > + > > +static void access_memory_async(struct kvm_vm *vm, int nr_vcpus, > > Maybe "nowait" instead of "async"? Yeah, something ends up being asynchronous > (presumably), but the call itself is "synchronous", i.e. isn't spun off to a > worker or anything. > > > + enum access_type access, > > + const char *description) > > +{ > > + return _access_memory(vm, nr_vcpus, access, description, false); > > } > > > > static void mark_memory_idle(struct kvm_vm *vm, int nr_vcpus) > > @@ -297,19 +396,115 @@ static void mark_memory_idle(struct kvm_vm *vm, int nr_vcpus) > > */ > > pr_debug("Marking VM memory idle (slow)...\n"); > > iteration_work = ITERATION_MARK_IDLE; > > - run_iteration(vm, nr_vcpus, "Mark memory idle"); > > + run_iteration(vm, nr_vcpus, "Mark memory idle", true); > > } > > > > -static void run_test(enum vm_guest_mode mode, void *arg) > > +static void create_memcg(const char *memcg) > > +{ > > + const char *full_memcg_path = memcg_path(memcg); > > + int ret; > > + > > + TEST_ASSERT(full_memcg_path, "Failed to construct full memcg path"); > > +retry: > > + ret = mkdir(full_memcg_path, 0755); > > + if (ret && errno == EEXIST) { > > + TEST_ASSERT(!rmdir(full_memcg_path), > > + "Found existing memcg at %s, but rmdir failed", > > + full_memcg_path); > > + goto retry; > > while (1) { > ret = mkdir(full_memcg_path, 0755); > if (!ret) > break; > > TEST_ASSERT(errno == EEXIST, > Creating the memcg via 'mkdir(%s)' failed", > full_memcg_path); > > TEST_ASSERT(!rmdir(full_memcg_path), > "Found existing memcg at %s, but rmdir failed", > full_memcg_path); > } > > > + } > > + TEST_ASSERT(!ret, "Creating the memcg failed: mkdir(%s) failed", > > Heh, so it failed? > > > + full_memcg_path); > > + > > + pr_info("Created memcg at %s\n", full_memcg_path); > > +} > > ... > > > +/* > > + * Test how much access tracking affects vCPU performance. > > + * > > + * Supports two modes of access tracking: > > + * - idle page tracking > > + * - lru_gen aging > > + * > > + * When using lru_gen, this test additionally verifies that the pages are in > > + * fact getting younger and older, otherwise the performance data would be > > + * invalid. > > + * > > + * The forced lru_gen aging can race with aging that occurs naturally. > > + */ > > +static void run_test(enum vm_guest_mode mode, struct kvm_vm *vm, > > + struct test_params *params) > > +{ > > + int nr_vcpus = params->nr_vcpus; > > + bool lru_gen = params->lru_gen; > > + struct memcg_stats stats; > > + // If guest_page_size is larger than the host's page size, the > > + // guest (memstress) will only fault in a subset of the host's pages. > > No C++ comments, please. > > > + long total_pages = nr_vcpus * params->vcpu_memory_bytes / > > + max(memstress_args.guest_page_size, > > + (uint64_t)getpagesize()); > > max_t() is probably better. > > > + int found_gens[5]; > > > > pr_info("\n"); > > access_memory(vm, nr_vcpus, ACCESS_WRITE, "Populating memory"); > > @@ -319,11 +514,78 @@ static void run_test(enum vm_guest_mode mode, void *arg) > > access_memory(vm, nr_vcpus, ACCESS_READ, "Reading from populated memory"); > > > > /* Repeat on memory that has been marked as idle. */ > > - mark_memory_idle(vm, nr_vcpus); > > + if (lru_gen) { > > + /* Do an initial page table scan */ > > + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); > > + TEST_ASSERT(sum_memcg_stats(&stats) >= total_pages, > > + "Not all pages tracked in lru_gen stats.\n" > > + "Is lru_gen enabled? Did the memcg get created properly?"); > > Align indentation. > > TEST_ASSERT(sum_memcg_stats(&stats) >= total_pages, > "Not all pages tracked in lru_gen stats.\n" > "Is lru_gen enabled? Did the memcg get created properly?"); > > > + > > + /* Find the generation we're currently in (probably youngest) */ > > + found_gens[0] = lru_gen_find_generation(&stats, total_pages); > > + > > + /* Do an aging pass now */ > > + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); > > + > > + /* Same generation, but a newer generation has been made */ > > + found_gens[1] = lru_gen_find_generation(&stats, total_pages); > > + TEST_ASSERT(found_gens[1] == found_gens[0], > > + "unexpected gen change: %d vs. %d", > > + found_gens[1], found_gens[0]); > > I don't have any ideas off the top of my head, but there's gotta be a way to > dedup these blocks. > > > + } else > > Needs curly braces. > > > + mark_memory_idle(vm, nr_vcpus); > > + > > access_memory(vm, nr_vcpus, ACCESS_WRITE, "Writing to idle memory"); > > - mark_memory_idle(vm, nr_vcpus); > > + > > + if (lru_gen) { > > + /* Scan the page tables again */ > > + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); > > + > > + /* The pages should now be young again, so in a newer generation */ > > + found_gens[2] = lru_gen_find_generation(&stats, total_pages); > > + TEST_ASSERT(found_gens[2] > found_gens[1], > > + "pages did not get younger"); > > + > > + /* Do another aging pass */ > > + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); > > + > > + /* Same generation; new generation has been made */ > > + found_gens[3] = lru_gen_find_generation(&stats, total_pages); > > + TEST_ASSERT(found_gens[3] == found_gens[2], > > + "unexpected gen change: %d vs. %d", > > + found_gens[3], found_gens[2]); > > + } else > > Once more, with feeling! > > > + mark_memory_idle(vm, nr_vcpus); > > + > > access_memory(vm, nr_vcpus, ACCESS_READ, "Reading from idle memory"); > > > > + if (lru_gen) { > > + /* Scan the pages tables again */ > > + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); > > + > > + /* The pages should now be young again, so in a newer generation */ > > + found_gens[4] = lru_gen_find_generation(&stats, total_pages); > > + TEST_ASSERT(found_gens[4] > found_gens[3], > > + "pages did not get younger"); > > + } > > +} > > ... > > > @@ -353,13 +618,15 @@ int main(int argc, char *argv[]) > > .backing_src = DEFAULT_VM_MEM_SRC, > > .vcpu_memory_bytes = DEFAULT_PER_VCPU_MEM_SIZE, > > .nr_vcpus = 1, > > + .lru_gen = false, > > + .benchmark_lru_gen = false, > > }; > > int page_idle_fd; > > int opt; > > > > guest_modes_append_default(); > > > > - while ((opt = getopt(argc, argv, "hm:b:v:os:")) != -1) { > > + while ((opt = getopt(argc, argv, "hm:b:v:os:lr:p")) != -1) { > > switch (opt) { > > case 'm': > > guest_modes_cmdline(optarg); > > @@ -376,6 +643,15 @@ int main(int argc, char *argv[]) > > case 's': > > params.backing_src = parse_backing_src_type(optarg); > > break; > > + case 'l': > > + params.lru_gen = true; > > + break; > > + case 'p': > > + params.benchmark_lru_gen = true; > > + break; > > + case 'r': > > + cgroup_root = strdup(optarg); > > + break; > > case 'h': > > default: > > help(argv[0]); > > @@ -383,12 +659,42 @@ int main(int argc, char *argv[]) > > } > > } > > > > - page_idle_fd = open("/sys/kernel/mm/page_idle/bitmap", O_RDWR); > > - __TEST_REQUIRE(page_idle_fd >= 0, > > - "CONFIG_IDLE_PAGE_TRACKING is not enabled"); > > - close(page_idle_fd); > > + if (!params.lru_gen) { > > + page_idle_fd = open("/sys/kernel/mm/page_idle/bitmap", O_RDWR); > > + __TEST_REQUIRE(page_idle_fd >= 0, > > + "CONFIG_IDLE_PAGE_TRACKING is not enabled"); > > + close(page_idle_fd); > > + } else { > > + int lru_gen_fd, lru_gen_debug_fd; > > + long mglru_features; > > + char mglru_feature_str[8] = {}; > > + > > + lru_gen_fd = open("/sys/kernel/mm/lru_gen/enabled", O_RDONLY); > > + __TEST_REQUIRE(lru_gen_fd >= 0, > > + "CONFIG_LRU_GEN is not enabled"); > > Noooo! Do not assume opening a path failed because some config option. That > reminds me, this needs to be rewritten. I can't count the number of times this > stupid test has oh so helpfully told me CONFIG_IDLE_PAGE_TRACKING is disabled, > when in fact the problem is that I didn't run it as root. > > page_idle_fd = open("/sys/kernel/mm/page_idle/bitmap", O_RDWR); > __TEST_REQUIRE(page_idle_fd >= 0, > "CONFIG_IDLE_PAGE_TRACKING is not enabled"); > > > By all means, give the user a hint, but don't assume anything. E.g. > > page_idle_fd = open("/sys/kernel/mm/page_idle/bitmap", O_RDWR); > __TEST_REQUIRE(page_idle_fd >= 0, > "Failed to open blah blah blah, is CONFIG_IDLE_PAGE_TRACKING enabled?"); > > > + TEST_ASSERT(read(lru_gen_fd, &mglru_feature_str, 7) > 0, > > + "couldn't read lru_gen features"); > > + mglru_features = strtol(mglru_feature_str, NULL, 16); > > + __TEST_REQUIRE(mglru_features & LRU_GEN_ENABLED, > > + "lru_gen is not enabled"); > > + __TEST_REQUIRE(mglru_features & LRU_GEN_MM_WALK, > > + "lru_gen does not support MM_WALK"); > > + > > + lru_gen_debug_fd = open(DEBUGFS_LRU_GEN, O_RDWR); > > + __TEST_REQUIRE(lru_gen_debug_fd >= 0, > > + "Cannot access %s", DEBUGFS_LRU_GEN); > > + close(lru_gen_debug_fd); > > + } > > + > > + TEST_ASSERT(!params.benchmark_lru_gen || params.lru_gen, > > + "-p specified without -l"); > > + > > + if (params.lru_gen) { > > + create_memcg(TEST_MEMCG_NAME); > > + move_to_memcg(TEST_MEMCG_NAME, getpid()); > > After this, cgroup_root is never used. Hint, hint. > > > + } > > > > - for_each_guest_mode(run_test, ¶ms); > > + for_each_guest_mode(setup_vm_and_run, ¶ms); > > > > return 0; > > } > > diff --git a/tools/testing/selftests/kvm/include/lru_gen_util.h b/tools/testing/selftests/kvm/include/lru_gen_util.h > > new file mode 100644 > > index 000000000000..4eef8085a3cb > > --- /dev/null > > +++ b/tools/testing/selftests/kvm/include/lru_gen_util.h > > @@ -0,0 +1,55 @@ > > +/* SPDX-License-Identifier: GPL-2.0-only */ > > +/* > > + * Tools for integrating with lru_gen, like parsing the lru_gen debugfs output. > > + * > > + * Copyright (C) 2024, Google LLC. > > + */ > > +#ifndef SELFTEST_KVM_LRU_GEN_UTIL_H > > +#define SELFTEST_KVM_LRU_GEN_UTIL_H > > + > > +#include <inttypes.h> > > +#include <limits.h> > > +#include <stdlib.h> > > + > > +#include "test_util.h" > > + > > +#define MAX_NR_GENS 16 /* MAX_NR_GENS in include/linux/mmzone.h */ > > +#define MAX_NR_NODES 4 /* Maximum number of nodes we support */ > > Who is "we"? KVM selftests? Kernel? > > + > > +static const char *DEBUGFS_LRU_GEN = "/sys/kernel/debug/lru_gen"; > > So, root again? > > > + > > +struct generation_stats { > > + int gen; > > + long age_ms; > > + long nr_anon; > > + long nr_file; > > +}; > > + > > +struct node_stats { > > + int node; > > + int nr_gens; /* Number of populated gens entries. */ > > + struct generation_stats gens[MAX_NR_GENS]; > > +}; > > + > > +struct memcg_stats { > > + unsigned long memcg_id; > > + int nr_nodes; /* Number of populated nodes entries. */ > > + struct node_stats nodes[MAX_NR_NODES]; > > +}; > > + > > +void print_memcg_stats(const struct memcg_stats *stats, const char *name); > > + > > +void read_memcg_stats(struct memcg_stats *stats, const char *memcg); > > + > > +void read_print_memcg_stats(struct memcg_stats *stats, const char *memcg); > > These need lru_gen_ namespacing. As is, they are very, very misleading names. > They don't read core memcg stuff, they read LRU_GEN stuff, which AFAICT happens > to be indexed by the memcg name. > > > +static void memcg_stats_handle_in_node(struct memcg_stats *stats, > > + struct memcg_stats_parse_context *ctx, > > + char *line) > > +{ > > + /* Have to copy since we might not consume */ > > Huh? > > > + char *my_line = strdup(line); > > + struct split_iterator it = { .str = my_line }; > > + char *gen, *age, *nr_anon, *nr_file; > > + struct node_stats *node_stats; > > + struct generation_stats *gen_stats; > > + char *end; > > + > > + TEST_ASSERT(it.str, "failed to copy input line"); > > + > > + gen = split_next(&it); > > + > > + /* Skip empty lines */ > > + if (!gen) > > + goto out_consume; /* Skip empty lines */ > > If you say it three times, it might happen! (Can you tell it's Friday afternoon?) > > > + if (!strcmp("memcg", gen) || !strcmp("node", gen)) { > > + /* > > + * Reached next memcg or node section. Don't consume, let the > > + * other handler deal with this. > > + */ > > + ctx->next_handler = memcg_stats_handle_in_memcg; > > + goto out; > > + } > > ... > > > +void print_memcg_stats(const struct memcg_stats *stats, const char *name) > > +{ > > + int node, gen; > > + > > + fprintf(stderr, "stats for memcg %s (id %lu):\n", > > Why stderr? This is effectively wrapped with DEBUG, so why not pr_debug()? > > > + name, stats->memcg_id); > > + for (node = 0; node < stats->nr_nodes; ++node) { > > + fprintf(stderr, "\tnode %d\n", stats->nodes[node].node); > > + for (gen = 0; gen < stats->nodes[node].nr_gens; ++gen) { > > + const struct generation_stats *gstats = > > + &stats->nodes[node].gens[gen]; > > + > > + fprintf(stderr, > > + "\t\tgen %d\tage_ms %ld" > > + "\tnr_anon %ld\tnr_file %ld\n", > > + gstats->gen, gstats->age_ms, gstats->nr_anon, > > + gstats->nr_file); > > + } > > + } > > +} > > + > > +/* Re-read lru_gen debugfs information for @memcg into @stats. */ > > +void read_memcg_stats(struct memcg_stats *stats, const char *memcg) > > +{ > > + FILE *f; > > + ssize_t read = 0; > > + char *line = NULL; > > + size_t bufsz; > > + struct memcg_stats_parse_context ctx = { > > + .next_handler = memcg_stats_handle_searching, > > + .name = memcg, > > + }; > > + > > + memset(stats, 0, sizeof(struct memcg_stats)); > > + > > + f = fopen(DEBUGFS_LRU_GEN, "r"); > > + TEST_ASSERT(f, "fopen(%s) failed", DEBUGFS_LRU_GEN); > > + > > + while (ctx.next_handler && (read = getline(&line, &bufsz, f)) > 0) { > > + ctx.consumed = false; > > + > > + do { > > + ctx.next_handler(stats, &ctx, line); > > + if (!ctx.next_handler) > > + break; > > + } while (!ctx.consumed); > > + } > > + > > + if (read < 0 && !feof(f)) > > + TEST_ASSERT(false, "getline(%s) failed", DEBUGFS_LRU_GEN); > > + > > + TEST_ASSERT(stats->memcg_id > 0, "Couldn't find memcg: %s\n" > > + "Did the memcg get created in the proper mount?", > > + memcg); > > + if (line) > > + free(line); > > + TEST_ASSERT(!fclose(f), "fclose(%s) failed", DEBUGFS_LRU_GEN); > > +} > > + > > +/* > > + * Find all pages tracked by lru_gen for this memcg in generation @target_gen. > > + * > > + * If @target_gen is negative, look for all generations. > > + */ > > +static long sum_memcg_stats_for_gen(int target_gen, > > + const struct memcg_stats *stats) > > +{ > > + int node, gen; > > + long total_nr = 0; > > + > > + for (node = 0; node < stats->nr_nodes; ++node) { > > + const struct node_stats *node_stats = &stats->nodes[node]; > > + > > + for (gen = 0; gen < node_stats->nr_gens; ++gen) { > > + const struct generation_stats *gen_stats = > > + &node_stats->gens[gen]; > > + > > + if (target_gen >= 0 && gen_stats->gen != target_gen) > > + continue; > > + > > + total_nr += gen_stats->nr_anon + gen_stats->nr_file; > > + } > > + } > > + > > + return total_nr; > > +} > > + > > +/* Find all pages tracked by lru_gen for this memcg. */ > > +long sum_memcg_stats(const struct memcg_stats *stats) > > +{ > > + return sum_memcg_stats_for_gen(-1, stats); > > +} > > + > > +/* Read the memcg stats and optionally print if this is a debug build. */ > > +void read_print_memcg_stats(struct memcg_stats *stats, const char *memcg) > > +{ > > + read_memcg_stats(stats, memcg); > > +#ifdef DEBUG > > + print_memcg_stats(stats, memcg); > > print_memcg_stats() should be static, because this is the only caller. But I'm > guessing you made it globally visible so that the compiler wouldn't complain > about unused functions. A better approach would be to wrap the guts with the > #ifdef. > > > +#endif > > +} > > + > > +/* > > + * If lru_gen aging should force page table scanning. > > + * > > + * If you want to set this to false, you will need to do eviction > > + * before doing extra aging passes. > > + */ > > +static const bool force_scan = true; > > + > > +static void run_aging_impl(unsigned long memcg_id, int node_id, int max_gen) > > +{ > > + FILE *f = fopen(DEBUGFS_LRU_GEN, "w"); > > + char *command; > > + size_t sz; > > + > > + TEST_ASSERT(f, "fopen(%s) failed", DEBUGFS_LRU_GEN); > > + sz = asprintf(&command, "+ %lu %d %d 1 %d\n", > > + memcg_id, node_id, max_gen, force_scan); > > + TEST_ASSERT(sz > 0, "creating aging command failed"); > > + > > + pr_debug("Running aging command: %s", command); > > + if (fwrite(command, sizeof(char), sz, f) < sz) { > > + TEST_ASSERT(false, "writing aging command %s to %s failed", > > + command, DEBUGFS_LRU_GEN); > > + } > > + > > + TEST_ASSERT(!fclose(f), "fclose(%s) failed", DEBUGFS_LRU_GEN); > > +} > > + > > +static void _lru_gen_do_aging(struct memcg_stats *stats, const char *memcg, > > Two underscores. > > > +/* Do aging, and print how long it took. */ > > +void lru_gen_do_aging(struct memcg_stats *stats, const char *memcg) > > +{ > > + return _lru_gen_do_aging(stats, memcg, true); > > +} > > + > > +/* Do aging, don't print anything. */ > > +void lru_gen_do_aging_quiet(struct memcg_stats *stats, const char *memcg) > > +{ > > + return _lru_gen_do_aging(stats, memcg, false); > > static inline helpers in the header? Having to come all this way to see that > these are simple wrapper is annoying. > > > +} > > + > > +/* > > + * Find which generation contains more than half of @total_pages, assuming that > > + * such a generation exists. > > + */ > > +int lru_gen_find_generation(const struct memcg_stats *stats, > > + unsigned long total_pages) > > +{ > > + int node, gen, gen_idx, min_gen = INT_MAX, max_gen = -1; > > + > > + for (node = 0; node < stats->nr_nodes; ++node) > > Curly braces. > > > + for (gen_idx = 0; gen_idx < stats->nodes[node].nr_gens; > > + ++gen_idx) { > > + gen = stats->nodes[node].gens[gen_idx].gen; > > + max_gen = gen > max_gen ? gen : max_gen; > > + min_gen = gen < min_gen ? gen : min_gen; > > + } > > + > > + for (gen = min_gen; gen < max_gen; ++gen) > > + /* See if the most pages are in this generation. */ > > + if (sum_memcg_stats_for_gen(gen, stats) > > > + total_pages / 2) > > + return gen; > > + > > + TEST_ASSERT(false, "No generation includes majority of %lu pages.", > > TEST_FAIL. > > > + total_pages); > > + > > + /* unreachable, but make the compiler happy */ > > + return -1; > > I _think_ selftests can use unreachable(). > > > +} > > -- > > 2.47.0.199.ga7371fff76-goog > > Thanks for all the feedback. I'll apply all of it (and Yosry's suggestion to use cgroup_util.h / otherwise de-duplicate the cgroup logic, thanks Yosry!) for the next version of this patch. For now I'll go ahead and send v9 of the main series.