Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup v2

KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> · Tue, 9 Feb 2010 17:02:28 +0900

On Mon, 8 Feb 2010 23:50:12 -0800 (PST)
David Rientjes <rientjes@xxxxxxxxxx> wrote:

> On Tue, 9 Feb 2010, KAMEZAWA Hiroyuki wrote:
> 
> > Index: mmotm-2.6.33-Feb06/include/linux/memcontrol.h
> > ===================================================================
> > --- mmotm-2.6.33-Feb06.orig/include/linux/memcontrol.h
> > +++ mmotm-2.6.33-Feb06/include/linux/memcontrol.h
> > @@ -71,7 +71,8 @@ extern unsigned long mem_cgroup_isolate_
> >  					struct mem_cgroup *mem_cont,
> >  					int active, int file);
> >  extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask);
> > -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem);
> > +int task_in_oom_mem_cgroup(struct task_struct *task,
> > +	const struct mem_cgroup *mem);
> 
> This is only called from the oom killer, so I'm not sure this needs to 
> be renamed.  
Why I renamed this is "be careful when a new user calls this".

> It seems like any caller of this function, present or future, 
> would be doing a tasklist iteration while holding a readlock on 
> tasklist_lock, so perhaps just document that task_in_mem_cgroup() requires 
> that?

Hmm. ok. I avoid this rename. It will make the patch smaller.

> 
> >  
> >  extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page);
> >  extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
> > @@ -215,7 +216,7 @@ static inline int mm_match_cgroup(struct
> >  	return 1;
> >  }
> >  
> > -static inline int task_in_mem_cgroup(struct task_struct *task,
> > +static inline int task_in_oom_mem_cgroup(struct task_struct *task,
> >  				     const struct mem_cgroup *mem)
> >  {
> >  	return 1;
> > Index: mmotm-2.6.33-Feb06/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.33-Feb06.orig/mm/memcontrol.c
> > +++ mmotm-2.6.33-Feb06/mm/memcontrol.c
> > @@ -781,16 +781,40 @@ void mem_cgroup_move_lists(struct page *
> >  	mem_cgroup_add_lru_list(page, to);
> >  }
> >  
> > -int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
> > +/*
> > + * This function is called from OOM Killer. This checks the task is mm_owner
> > + * and checks it's mem_cgroup is under oom.
> > + */
> > +int task_in_oom_mem_cgroup(struct task_struct *task,
> > +		const struct mem_cgroup *mem)
> >  {
> > +	struct mm_struct *mm;
> >  	int ret;
> >  	struct mem_cgroup *curr = NULL;
> >  
> > -	task_lock(task);
> > +	/*
> > + 	 * The task's task->mm pointer is guarded by task_lock() but it's
> > + 	 * risky to take task_lock in oom kill situaion. Oom-killer may
> > + 	 * kill a task which is in unknown status and cause siginificant delay
> > + 	 * or deadlock.
> > + 	 * So, we use some loose way. Because we're under taslist lock, "task"
> > + 	 * pointer is always safe and we can access it. So, accessing mem_cgroup
> > + 	 * via task struct is safe. To check the task is mm owner, we do loose
> > + 	 * check. And this is enough.
> > + 	 * There is small race at updating mm->onwer but we can ignore it.
> > + 	 * A problematic race here means that oom-selection logic by walking
> > + 	 * task list itself is racy. We can't make any strict guarantee between
> > + 	 * task's cgroup status and oom-killer selection, anyway. And, in real
> > + 	 * world, this will be no problem.
> > + 	 */
> > +	mm = task->mm;
> > +	if (!mm || mm->owner != task)
> > +		return 0;
> 
> You can't dereference task->mm->owner without holding task_lock(task), but 
> I don't see why you need to even deal with task->mm.  All callers to this 
> function will check for !task->mm either during their iterations or with 
> oom_kill_task() returning 0.
> 
Just for being careful. We don't hold task_lock(), which guards task->mm in
callers.

> >  	rcu_read_lock();
> > -	curr = try_get_mem_cgroup_from_mm(task->mm);
> > +	curr = mem_cgroup_from_task(task);
> > +	if (!css_tryget(&curr->css));
> > +		curr = NULL;
> 
> We can always dereference p because of tasklist_lock, there should be no 
> need to do rcu_read_lock() or any rcu dereference, so you should be able 
> to just do this:
> 
> 	do {
> 		curr = mem_cgroup_from_task(task);
> 		if (!curr)
> 			break;
> 	} while (!css_tryget(&curr->css));
> 
Ok, I missed that. thank you. I'll use this code.

> If you like that better, I suggest sending your original two-liner fix 
> using task_in_mem_cgroup() while taking task_lock(p) to stable and then 
> improving on it with a follow-up patch for mainline to do this refcount 
> variation.
> 
Hmm. ok. I'll devide the patch into 2 parts. Thank you for review.

Regards,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>