On Tue 08-02-22 13:40:57, Waiman Long wrote: > On 2/8/22 07:13, Michal Hocko wrote: > > On Mon 07-02-22 19:05:31, Waiman Long wrote: > > > It was found that a number of dying memcgs were not freed because > > > they were pinned by some charged pages that were present. Even "echo 1 > > > > /proc/sys/vm/drop_caches" wasn't able to free those pages. These dying > > > but not freed memcgs tend to increase in number over time with the side > > > effect that percpu memory consumption as shown in /proc/meminfo also > > > increases over time. > > I still believe that this is very suboptimal way to debug offline memcgs > > but memcg information can be useful in other contexts and it doesn't > > cost us anything except for an additional output so I am fine with this. > > I am planning to have a follow-up patch to add a new debugfs file for just > printing page information associated with dying memcgs only. It will be > based on the existing page_owner code, though. So I need to get this patch > in first. Sure. I would give a shot the drgn approach as this can be much more versatile without any additional kernel code. [...] > > > + dying = (memcg->css.flags & CSS_DYING); > > Is there any specific reason why you haven't used mem_cgroup_online? > Not really. However, I think checking for CSS_DYING makes more sense now > that I using the term "dying". I do not really care much but I though CSS_DYING is a cgroup internal thing. We have a highlevel API so I thought it would be used preferably. -- Michal Hocko SUSE Labs