On Fri, Aug 04, 2023 at 02:59:28PM -0400, Lucas Karpinski wrote: > On Fri, Aug 04, 2023 at 12:37:16PM -0400, Johannes Weiner wrote: > > On Fri, Aug 04, 2023 at 11:37:33AM -0400, Lucas Karpinski wrote: > > > The test allocates dcache inside a cgroup, then destroys the cgroups and > > > then checks the sanity of numbers on the parent level. The reason it > > > fails is because dentries are freed with an RCU delay - a debugging > > > sleep shows that usage drops as expected shortly after. > > > > > > Insert a 1s sleep after completing the cgroup creation/deletions. This > > > should be good enough, assuming that machines running those tests are > > > otherwise not very busy. This commit is directly inspired by Johannes > > > over at the link below. > > > > > > Link: https://lore.kernel.org/all/20230801135632.1768830-1-hannes@xxxxxxxxxxx/ > > > > > > Signed-off-by: Lucas Karpinski <lkarpins@xxxxxxxxxx> > > > > Maybe I'm missing something, but there isn't a limit set anywhere that > > would cause the dentries to be reclaimed and freed, no? When the > > subgroups are deleted, the objects are just moved to the parent. The > > counters inside the parent (which are hierarchical) shouldn't change. > > > > So this seems to be a different scenario than test_kmem_basic. If the > > test is failing for you, I can't quite see why. > > > You're right, the parent inherited the counters and it should behave > the same whether I'm directly removing the child or if I was moving it > under another cgroup. I do see the behaviour you described on my > x86_64 setup, but the wrong behaviour on my aarch64 dev. platform. I'll > take a closer look, but just wanted to leave an example here of what I > see. > > Example of slab size pre/post sleep: > slab_pre = 18164688, slab_post = 3360000 > > Thanks, > Lucas Looked into the failures and I do have a proposed solution, just want some feedback first. With how the kernel entry in memory.stat is updated, it takes into account all charged / uncharged pages, it looks like it makes more sense to use that single entry rather than `slab + anon + file + kernel_stack + pagetables + percpu + sock' as it would cover all utilization. Thanks, Lucas