On Mon, Apr 12, 2021 at 01:14:57PM -0400, Johannes Weiner wrote: > On Fri, Apr 09, 2021 at 06:29:46PM -0700, Roman Gushchin wrote: > > On Fri, Apr 09, 2021 at 08:29:41PM +0800, Muchun Song wrote: > > > Since the following patchsets applied. All the kernel memory are charged > > > with the new APIs of obj_cgroup. > > > > > > [v17,00/19] The new cgroup slab memory controller > > > [v5,0/7] Use obj_cgroup APIs to charge kmem pages > > > > > > But user memory allocations (LRU pages) pinning memcgs for a long time - > > > it exists at a larger scale and is causing recurring problems in the real > > > world: page cache doesn't get reclaimed for a long time, or is used by the > > > second, third, fourth, ... instance of the same job that was restarted into > > > a new cgroup every time. Unreclaimable dying cgroups pile up, waste memory, > > > and make page reclaim very inefficient. > > > > > > We can convert LRU pages and most other raw memcg pins to the objcg direction > > > to fix this problem, and then the LRU pages will not pin the memcgs. > > > > > > This patchset aims to make the LRU pages to drop the reference to memory > > > cgroup by using the APIs of obj_cgroup. Finally, we can see that the number > > > of the dying cgroups will not increase if we run the following test script. > > > > > > ```bash > > > #!/bin/bash > > > > > > cat /proc/cgroups | grep memory > > > > > > cd /sys/fs/cgroup/memory > > > > > > for i in range{1..500} > > > do > > > mkdir test > > > echo $$ > test/cgroup.procs > > > sleep 60 & > > > echo $$ > cgroup.procs > > > echo `cat test/cgroup.procs` > cgroup.procs > > > rmdir test > > > done > > > > > > cat /proc/cgroups | grep memory > > > ``` > > > > > > Patch 1 aims to fix page charging in page replacement. > > > Patch 2-5 are code cleanup and simplification. > > > Patch 6-18 convert LRU pages pin to the objcg direction. > > > > > > Any comments are welcome. Thanks. > > > > Indeed the problem exists for a long time and it would be nice to fix it. > > However I'm against merging the patchset in the current form (there are some > > nice fixes/clean-ups, which can/must be applied independently). Let me explain > > my concerns: > > > > Back to the new slab controller discussion obj_cgroup was suggested by Johannes > > as a union of two concepts: > > 1) reparenting (basically an auto-pointer to a memcg in c++ terms) > > 2) byte-sized accounting > > > > I was initially against this union because I anticipated that the reparenting > > part will be useful separately. And the time told it was true. > > "The idea of moving stocks and leftovers to the memcg_ptr/obj_cgroup > level is really good." > > https://lore.kernel.org/lkml/20191025200020.GA8325@xxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > If you recall, the main concern was how the byte charging interface > was added to the existing page charging interface, instead of being > layered on top of it. I suggested to do that and, since there was no > other user for the indirection pointer, just include it in the API. > > It made sense at the time, and you seemed to agree. But I also agree > it makes sense to factor it out now that more users are materializing. Agreed. > > > I still think obj_cgroup API must be significantly reworked before being > > applied outside of the kmem area: reparenting part must be separated > > and moved to the cgroup core level to be used not only in the memcg > > context but also for other controllers, which are facing similar problems. > > Spilling obj_cgroup API in the current form over all memcg code will > > make it more complicated and will delay it, given the amount of changes > > and the number of potential code conflicts. > > > > I'm working on the generalization of obj_cgroup API (as described above) > > and expect to have some patches next week. > > Yeah, splitting the byte charging API from the reference API and > making the latter cgroup-generic makes sense. I'm looking forward to > your patches. > > And yes, the conflicts between that work and Muchun's patches would be > quite large. However, most of them would come down to renames, since > the access rules and refcounting sites will remain the same, so it > shouldn't be too bad to rebase Muchun's patches on yours. And we can > continue reviewing his patches for correctness for now. Sounds good to me! Thanks