Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Sorry about the slow response but I was offline for almost two weeks
and catching up with a tsunami in my inbox now]

On Fri 09-03-18 19:48:46, Tetsuo Handa wrote:
> Kohli, Gaurav wrote:
> > > t->alloc_lock is still held when leaving find_lock_task_mm(), which means
> > > that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at
> > > exit_mm() from do_exit() and calling exit_creds() from __put_task_struct(t)
> > > after task_unlock(t) is called. Seems difficult to trigger race window. Maybe
> > > something has preempted because oom_badness() becomes outside of RCU grace
> > > period upon leaving find_lock_task_mm() when called from proc_oom_score().
> > 
> > Hi Tetsuo,
> > 
> > Yes it is not easy to reproduce seen twice till now and i agree with
> > your analysis. But David has already fixing this in different way,
> > So that also looks better to me:
> > 
> > https://patchwork.kernel.org/patch/10265641/
> > 
> 
> Yes, I'm aware of that patch.
> 
> > But if need to keep that code, So we have to bump up the task
> > reference that's only i can think of now.
> 
> I don't think so, for I think it is safe to call
> has_capability_noaudit(p) with p->alloc_lock held.

This however adds a subtle assumption on locking here and we should
rather not do so. The scope of alloc_lock is quite messy already and
adding on top is definitely not an improvement.

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index f2e7dfb..4efcfb8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -222,7 +222,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
>  	 */
>  	points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
>  		mm_pgtables_bytes(p->mm) / PAGE_SIZE;
> -	task_unlock(p);
>  
>  	/*
>  	 * Root processes get 3% bonus, just like the __vm_enough_memory()
> @@ -230,6 +229,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
>  	 */
>  	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
>  		points -= (points * 3) / 100;
> +	task_unlock(p);
>  
>  	/* Normalize to oom_score_adj units */
>  	adj *= totalpages / 1000;

-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux