Hi Kairui, On Mon, Nov 20, 2023 at 3:17 AM Kairui Song <ryncsn@xxxxxxxxx> wrote: > > Huang, Ying <ying.huang@xxxxxxxxx> 于2023年11月20日周一 15:37写道: > > > > Kairui Song <ryncsn@xxxxxxxxx> writes: > > > > > From: Kairui Song <kasong@xxxxxxxxxxx> > > > > > > When a process which previously swapped some memory was moved to > > > another cgroup, and the cgroup it previous in is dead, then swapped in > > > pages will be leaked into rootcg. Previous commits fixed the bug for > > > no readahead path, this commit fix the same issue for readahead path. > > > > > > This can be easily reproduced by: > > > - Setup a SSD or HDD swap. > > > - Create memory cgroup A, B and C. > > > - Spawn process P1 in cgroup A and make it swap out some pages. > > > - Move process P1 to memory cgroup B. > > > - Destroy cgroup A. > > > - Do a swapoff in cgroup C > > > - Swapped in pages is accounted into cgroup C. > > > > > > This patch will fix it make the swapped in pages accounted in cgroup B. > > > > Accroding to "Memory Ownership" section of > > Documentation/admin-guide/cgroup-v2.rst, > > > > " > > A memory area is charged to the cgroup which instantiated it and stays > > charged to the cgroup until the area is released. Migrating a process > > to a different cgroup doesn't move the memory usages that it > > instantiated while in the previous cgroup to the new cgroup. > > " > > > > Because we don't move the charge when we move a task from one cgroup to > > another. It's controversial which cgroup should be charged to. > > According to the above document, it's acceptable to charge to the cgroup > > C (cgroup where swapoff happens). > > Hi Ying, thank you very much for the info! > > It is controversial indeed, just the original behavior is kind of > counter-intuitive. > > Image if there are cgroup P1, and its child cgroup C1 C2. If a process > swapped out some memory in C1 then moved to C2, and C1 is dead. > On swapoff the charge will be moved out of P1... > > And swapoff often happen on some unlimited cgroup or some cgroup for > management agent. > > If P1 have a memory limit, it can breech the limit easily, we will see > a process that never leave P1 having a much higher RSS that P1/C1/C2's > limit. > And if there is a limit for the management agent cgroup, the agent > will be OOM instead of OOM in P1. I think I will reply to another similar email. If you want OOM in P1, you can have an admin program. fork and execute a new process, add the new process into P1, then swap off from that new process. > > Simply moving a process between the child cgroup of the same parent > cgroup won't cause such issue, thing get weird when swapoff is > involved. > > Or maybe we should try to be compatible, and introduce a sysctl or > cmdline for this? If the above suggestion works, then you don't need to change swap off? If you still want to change the charging model. I like to see the bigger picture, what rules it follows and how it works in other situations. Chris