Re: [PATCH] mm: memcontrol: protect the memory in cgroup from being oom killed

Michal Hocko <mhocko@xxxxxxxx> · Wed, 30 Nov 2022 14:15:06 +0100

On Wed 30-11-22 15:01:58, chengkaitao wrote:
> From: chengkaitao <pilgrimtao@xxxxxxxxx>
> 
> We created a new interface <memory.oom.protect> for memory, If there is
> the OOM killer under parent memory cgroup, and the memory usage of a
> child cgroup is within its effective oom.protect boundary, the cgroup's
> tasks won't be OOM killed unless there is no unprotected tasks in other
> children cgroups. It draws on the logic of <memory.min/low> in the
> inheritance relationship.

Could you be more specific about usecases? How do you tune oom.protect
wrt to other tunables? How does this interact with the oom_score_adj
tunining (e.g. a first hand oom victim with the score_adj 1000 sitting
in a oom protected memcg)?

I haven't really read through the whole patch but this struck me odd.

> @@ -552,8 +552,19 @@ static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns,
>  	unsigned long totalpages = totalram_pages() + total_swap_pages;
>  	unsigned long points = 0;
>  	long badness;
> +#ifdef CONFIG_MEMCG
> +	struct mem_cgroup *memcg;
>  
> -	badness = oom_badness(task, totalpages);
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(task);
> +	if (memcg && !css_tryget(&memcg->css))
> +		memcg = NULL;
> +	rcu_read_unlock();
> +
> +	update_parent_oom_protection(root_mem_cgroup, memcg);
> +	css_put(&memcg->css);
> +#endif
> +	badness = oom_badness(task, totalpages, MEMCG_OOM_PROTECT);

the badness means different thing depending on which memcg hierarchy
subtree you look at. Scaling based on the global oom could get really
misleading.

-- 
Michal Hocko
SUSE Labs