On Sat, Apr 23, 2022 at 7:28 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > On Thu, Apr 21, 2022 at 11:44:26PM +0000, Yosry Ahmed wrote: > > Add a new test for memory.reclaim that verifies that the interface > > correctly reclaims memory as intended, from both anon and file pages. > > > > Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > Acked-by: Roman Gushchin <roman.gushchin@xxxxxxxxx> > > --- > > .../selftests/cgroup/test_memcontrol.c | 86 +++++++++++++++++++ > > 1 file changed, 86 insertions(+) > > > > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c > > index f2ffb3a30194..5f7c20de2426 100644 > > --- a/tools/testing/selftests/cgroup/test_memcontrol.c > > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c > > @@ -760,6 +760,91 @@ static int test_memcg_max(const char *root) > > return ret; > > } > > > > +/* > > + * This test checks that memory.reclaim reclaims the given > > + * amount of memory (from both anon and file). > > + */ > > +static int test_memcg_reclaim(const char *root) > > +{ > > + int ret = KSFT_FAIL, fd, retries; > > + char *memcg; > > + long current, to_reclaim; > > + char buf[64]; > > + > > + memcg = cg_name(root, "memcg_test"); > > + if (!memcg) > > + goto cleanup; > > + > > + if (cg_create(memcg)) > > + goto cleanup; > > + > > + current = cg_read_long(memcg, "memory.current"); > > + if (current != 0) > > + goto cleanup; > > + > > + cg_run_nowait(memcg, alloc_anon_noexit, (void *) MB(50)); > > Don't you need is_swap_enabled() check before deciding to do the anon > allocations? Yes you are right. In the next version will check whether or not swap is enabled and modify the test accordingly. > > + sleep(1); > > + > > + fd = get_temp_fd(); > > + if (fd < 0) > > + goto cleanup; > > + > > + cg_run_nowait(memcg, alloc_pagecache_50M_noexit, (void *)(long)fd); > > + sleep(1); > > These sleep(1)s do not seem robust. Since kernel keeps the page cache > around, you can convert anon to use tmpfs and use simple cg_run to > trigger the allocations of anon (tmpfs) and file which will remain in > memory even after return from cg_run. Other tests in the file are also using sleep approach (see test_memcg_min, although it retries for multiple times until memory.current reaches an expected amount). In my experience it hasn't been flaky running for multiple times on different machines, but I agree it can be flaky (false negative). I am not sure about the allocating file pages with cg_run, is it guaranteed that the page cache will remain in memory until the test ends? If it doesn't, it can also flake, but it would produce false positives (the test could pass because the kernel drained page cache for some other reason although the interface is not working correctly). In my personal opinion, false negative flakes are better than false positives. At least currently the test explicitly and clearly fails if the allocations are not successful. If we rely on the page cache remaining until the test finishes then it could silently pass if the interface is not working correctly. There are a few ways we can go forward with this: 1) Keep everything as-is, but print a message if the test fails due to memory.current not reaching 100MB to make it clear that it didn't fail due to a problem with the interface. 2) Add a sleep/retry loop similar to test_memcg_min instead of sleeping once. 3) Send a signal from forked children when they are done with the allocation, and wait to receive this signal in the test to make sure the allocation is completed. In my opinion we should do (1) (and maybe (2)) for now as (3) could be an overkill if the test is normal passing. Maybe add a comment about (3) being an option in the future if the test flakes. Let me know what you think?