Re: [v9 3/5] mm, oom: cgroup-aware OOM killer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 03-10-17 15:08:41, Roman Gushchin wrote:
> On Tue, Oct 03, 2017 at 03:36:23PM +0200, Michal Hocko wrote:
[...]
> > I guess we want to inherit the value on the memcg creation but I agree
> > that enforcing parent setting is weird. I will think about it some more
> > but I agree that it is saner to only enforce per memcg value.
> 
> I'm not against, but we should come up with a good explanation, why we're
> inheriting it; or not inherit.

Inheriting sounds like a less surprising behavior. Once you opt in for
oom_group you can expect that descendants are going to assume the same
unless they explicitly state otherwise.

[...]
> > > > > @@ -962,6 +968,48 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
> > > > >  	__oom_kill_process(victim);
> > > > >  }
> > > > >  
> > > > > +static int oom_kill_memcg_member(struct task_struct *task, void *unused)
> > > > > +{
> > > > > +	if (!tsk_is_oom_victim(task)) {
> > > > 
> > > > How can this happen?
> > > 
> > > We do start with killing the largest process, and then iterate over all tasks
> > > in the cgroup. So, this check is required to avoid killing tasks which are
> > > already in the termination process.
> > 
> > Do you mean we have tsk_is_oom_victim && MMF_OOM_SKIP == T?
> 
> No, just tsk_is_oom_victim. We're are killing the biggest task, and then _all_
> tasks. This is a way to skip the biggest task, and do not kill it again.

OK, I have missed that part. Why are we doing that actually? Why don't
we simply do 
	/* If oom_group flag is set, kill all belonging tasks */
	if (mem_cgroup_oom_group(oc->chosen_memcg))
		mem_cgroup_scan_tasks(oc->chosen_memcg, oom_kill_memcg_member,
				      NULL);

we are going to kill all the tasks anyway.

[...]
> > > > Hmm, does the full dump_header really apply for the new heuristic? E.g.
> > > > does it make sense to dump_tasks()? Would it make sense to print stats
> > > > of all eligible memcgs instead?
> > > 
> > > Hm, this is a tricky part: the dmesg output is at some point a part of ABI,
> > 
> > People are parsing oom reports but I disagree this is an ABI of any
> > sort. The report is closely tight to the particular implementation and
> > as such it has changed several times over the time.
> > 
> > > but is also closely connected with the implementation. So I would suggest
> > > to postpone this until we'll get more usage examples and will better
> > > understand what information we need.
> > 
> > I would drop tasks list at least because that is clearly misleading in
> > this context because we are not selecting from all tasks. We are
> > selecting between memcgs. The memcg information can be added in a
> > separate patch of course.
> 
> Let's postpone it until we'll land the rest of the patchset.

This is certainly not a show stopper but I would like to resolve it
sooner rather than later.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux