Re: [PATCH v5 3/4] mm/page_owner: Print memcg information

Michal Hocko <mhocko@xxxxxxxx> · Tue, 8 Feb 2022 20:11:17 +0100

On Tue 08-02-22 13:40:57, Waiman Long wrote:
> On 2/8/22 07:13, Michal Hocko wrote:
> > On Mon 07-02-22 19:05:31, Waiman Long wrote:
> > > It was found that a number of dying memcgs were not freed because
> > > they were pinned by some charged pages that were present. Even "echo 1 >
> > > /proc/sys/vm/drop_caches" wasn't able to free those pages. These dying
> > > but not freed memcgs tend to increase in number over time with the side
> > > effect that percpu memory consumption as shown in /proc/meminfo also
> > > increases over time.
> > I still believe that this is very suboptimal way to debug offline memcgs
> > but memcg information can be useful in other contexts and it doesn't
> > cost us anything except for an additional output so I am fine with this.
>
> I am planning to have a follow-up patch to add a new debugfs file for just
> printing page information associated with dying memcgs only. It will be
> based on the existing page_owner code, though. So I need to get this patch
> in first.

Sure. I would give a shot the drgn approach as this can be much more
versatile without any additional kernel code.

[...]

> > > +	dying = (memcg->css.flags & CSS_DYING);
> > Is there any specific reason why you haven't used mem_cgroup_online?
> Not really. However, I think checking for CSS_DYING makes more sense now
> that I using the term "dying".

I do not really care much but I though CSS_DYING is a cgroup internal
thing. We have a highlevel API so I thought it would be used
preferably.
-- 
Michal Hocko
SUSE Labs