On Thu, Aug 29, 2024 at 8:22 PM liujing <liujing@xxxxxxxxxxxxxxxxxxxx> wrote: > > hello,linux boss > > I found a problem in the process of using linux memcg,When I turned swap off, the memcg memory I created with the following script could not be deleted with echo 0 > memory.force_empty, as explained below。 (Adding memcg maintainers in case they are interested) It's not a problem, it's the way the linux kernel currently behaves in terms of handling deleted memcgs that are still referenced in the kernel (i.e. offline/dying/zombie memcgs). > > ---------------------------------------------------------------------------------------------------------- > step1:swapoff -a > > > step2:use this script to create memcg > > #!/bin/bash > mkdir -p /tmp/test > for i in 'seq 2000' > do > sudo mkdir -p /sys/fs/cgroup/memory/user.slice/user-0.slice/test$ {i} > sudo echo $$ > /sys/fs/cgroup/memory/user.slice/user-0.slice/test$ {i}/tasks > sudo echo 'data' > /tmp/test/test$ {i} Assuming /tmp is a tmpfs mount, here you created 2000 child memcgs and allocated one tmpfs page in each of them. So each of those child memcgs is charged for one page of memory, and each charge holds a reference to the the respective memcg. > sudo echo $$ > /sys/fs/cgroup/memory/user.slice/user-0.slice/tasks > sudo rmdir /sys/fs/cgroup/memory/user.slice/user-0.slice/test$ {i} Then you deleted those memcgs, but the kernel cannot free them yet because the tmpfs memory you allocated above is still charged to them. > done > > > step3:view /proc/cgroup and /proc/meminfo files > > [root@localhost ~]# cat /proc/cgroups > #subsys_name hierarchy num_cgroups enabled > cpuset 10 1 1 > cpu 4 1 1 > cpuacct 4 1 1 > blkio 13 1 1 > memory 14 2009 1 Here you can see the cgroups you deleted still exist in the kernel. > devices 6 94 1 > > [root@localhost ~]# cat /proc/meminfo | grep Percpu > Percpu: 600576 kB The percpu memory you observe here is likely the per-CPU metadata that the kernel uses to keep track of each memcg. Since the memcgs are not freed, the metadata is not freed either. > > > step4:when I use "echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slice/memory.force_empty", I find the num_cgroups of memory and percpu have no changed Yes, because at this point there is no swap, so the tmpfs memory charged to the deleted memcg cannot be reclaimed and cannot be freed, and the refs they hold cannot be dropped. > > [root@localhost ~]# echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slice/memory.force_empty > [root@localhost ~]# cat /proc/cgroups > #subsys_name hierarchy num_cgroups enabled > cpuset 10 1 1 > cpu 4 1 1 > cpuacct 4 1 1 > blkio 13 1 1 > memory 14 2039 1 > devices 6 87 1 > > [root@localhost ~]# cat /proc/meminfo | grep Percpu > Percpu: 600576 kB > > > step 5: when I use swapon -a to open swap, then echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slice/memory.force_empty again > > [root@localhost ~]# swapon -a > [root@localhost ~]# echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slice/memory.force_empty When you add a swapfile and try to reclaim memory from the cgroups again, the kernel is able to reclaim the tmpfs memory by swapping it out. The kernel is smart enough at this point to not charge the swap slots to the deleted cgroups, but to their living/online parent. At this point, the tmpfs memory is uncharged and freed, and the refs to the deleted cgroups are dropped. Now they can be deleted by the kernel. > > > step 6: view /proc/cgroup and /proc/meminfo files ,I found the the num_cgroups of memory and percpu have been reduced. > [root@localhost ~]# cat /proc/cgroups > #subsys_name hierarchy num_cgroups enabled > cpuset 10 1 1 > cpu 4 1 1 > cpuacct 4 1 1 > blkio 13 1 1 > memory 14 185 1 > devices 6 87 1 > freezer 9 1 1 > > [root@localhost ~]# cat /proc/meminfo | grep Percpu > Percpu: 120832 kB Now the memcgs are freed, and their associated per-CPU metadata is also freed. > -------------------------------------------------------------------------------------------------------- > > > Therefore, I want to know why swap affects memcg memory reclamation, echo 0 > memory.force_empty this interface should force the memory used by the cgroup to be reclaimed. > I want to know why ,I look forward to hearing back from the community. I hope it's now clear that the per-CPU memory cannot be freed when you use memory.force_empty on the parent memcg, because the per-CPU memory is the metadata of the deleted memcgs, and those cannot be freed until the memory charged to them is freed (which needs swap, because it's tmpfs not a regular file).