Re: [PATCH] mm, memcg: clear page protection when memcg oom group happens

Michal Hocko <mhocko@xxxxxxxxxx> · Mon, 25 Nov 2019 13:31:23 +0100



On Mon 25-11-19 20:17:15, Yafang Shao wrote:
> On Mon, Nov 25, 2019 at 7:54 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >
> > On Mon 25-11-19 19:37:59, Yafang Shao wrote:
> > > On Mon, Nov 25, 2019 at 7:08 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > > >
> > > > On Mon 25-11-19 05:14:53, Yafang Shao wrote:
> > > > > We set memory.oom.group to make all processes in this memcg are killed by
> > > > > OOM killer to free more pages. In this case, it doesn't make sense to
> > > > > protect the pages with memroy.{min, low} again if they are set.
> > > >
> > > > I do not see why? What does group OOM killing has anything to do with
> > > > the reclaim protection? What is the actual problem you are trying to
> > > > solve?
> > > >
> > >
> > > The cgroup is treated as a indivisible  workload when cgroup.oom.group
> > > is set and OOM killer is trying to kill a prcess in this cgroup.
> >
> > Yes this is true.
> >
> > > We set cgroup.oom.group is to  guarantee the workload integrity, now
> > > that processes ara all killed, why keeps the page cache here?
> >
> > Because an administrator has configured the reclaim protection in a
> > certain way and hopefully had a good reason to do that. We are not going
> > to override that configure just because there is on OOM killer invoked
> > and killed tasks in that memcg. The workload might get restarted and it
> > would run under a different constrains all of the sudden which is not
> > expected.
> >
> > In short kernel should never silently change the configuration made by
> > an admistrator.
> 
> Understood.
> 
> So what about bellow changes ? We don't override the admin setting,
> but we reclaim the page caches from it if this memcg is oom killed.
> Something like,
> 
> mem_cgroup_protected
> {
> ...
> +       if (!cgroup_is_populated(memcg->css.cgroup) &&
> mem_cgroup_under_oom_group_kill(memcg))
> +               return MEMCG_PROT_NONE;
> +
>         usage = page_counter_read(&memcg->memory);
>         if (!usage)
>                 return MEMCG_PROT_NONE;
> }

I assume that mem_cgroup_under_oom_group_kill is essentially
	memcg->under_oom && memcg->oom_group
But that doesn't really help much because all the reclaim attempts have
been already attempted and failed. I do not remember exact details about
under_oom but I have a recollection that it wouldn't really work for
cgroup v2 because the oom_control is not in place and so the state would
be set for only very short time period.

Again, what is a problem that you are trying to fix?
-- 
Michal Hocko
SUSE Labs