On Tue, Apr 11, 2023 at 4:36 PM T.J. Mercier <tjmercier@xxxxxxxxxx> wrote: > > When a memcg is removed by userspace it gets offlined by the kernel. > Offline memcgs are hidden from user space, but they still live in the > kernel until their reference count drops to 0. New allocations cannot > be charged to offline memcgs, but existing allocations charged to > offline memcgs remain charged, and hold a reference to the memcg. > > As such, an offline memcg can remain in the kernel indefinitely, > becoming a zombie memcg. The accumulation of a large number of zombie > memcgs lead to increased system overhead (mainly percpu data in struct > mem_cgroup). It also causes some kernel operations that scale with the > number of memcgs to become less efficient (e.g. reclaim). > > There are currently out-of-tree solutions which attempt to > periodically clean up zombie memcgs by reclaiming from them. However > that is not effective for non-reclaimable memory, which it would be > better to reparent or recharge to an online cgroup. There are also > proposed changes that would benefit from recharging for shared > resources like pinned pages, or DMA buffer pages. I am very interested in attending this discussion, it's something that I have been actively looking into -- specifically recharging pages of offlined memcgs. > > Suggested attendees: > Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Yu Zhao <yuzhao@xxxxxxxxxx> > T.J. Mercier <tjmercier@xxxxxxxxxx> > Tejun Heo <tj@xxxxxxxxxx> > Shakeel Butt <shakeelb@xxxxxxxxxx> > Muchun Song <muchun.song@xxxxxxxxx> > Johannes Weiner <hannes@xxxxxxxxxxx> > Roman Gushchin <roman.gushchin@xxxxxxxxx> > Alistair Popple <apopple@xxxxxxxxxx> > Jason Gunthorpe <jgg@xxxxxxxxxx> > Kalesh Singh <kaleshsingh@xxxxxxxxxx>