[Sorry about the slow response but I was offline for almost two weeks and catching up with a tsunami in my inbox now] On Fri 09-03-18 19:48:46, Tetsuo Handa wrote: > Kohli, Gaurav wrote: > > > t->alloc_lock is still held when leaving find_lock_task_mm(), which means > > > that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at > > > exit_mm() from do_exit() and calling exit_creds() from __put_task_struct(t) > > > after task_unlock(t) is called. Seems difficult to trigger race window. Maybe > > > something has preempted because oom_badness() becomes outside of RCU grace > > > period upon leaving find_lock_task_mm() when called from proc_oom_score(). > > > > Hi Tetsuo, > > > > Yes it is not easy to reproduce seen twice till now and i agree with > > your analysis. But David has already fixing this in different way, > > So that also looks better to me: > > > > https://patchwork.kernel.org/patch/10265641/ > > > > Yes, I'm aware of that patch. > > > But if need to keep that code, So we have to bump up the task > > reference that's only i can think of now. > > I don't think so, for I think it is safe to call > has_capability_noaudit(p) with p->alloc_lock held. This however adds a subtle assumption on locking here and we should rather not do so. The scope of alloc_lock is quite messy already and adding on top is definitely not an improvement. > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index f2e7dfb..4efcfb8 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -222,7 +222,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, > */ > points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + > mm_pgtables_bytes(p->mm) / PAGE_SIZE; > - task_unlock(p); > > /* > * Root processes get 3% bonus, just like the __vm_enough_memory() > @@ -230,6 +229,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, > */ > if (has_capability_noaudit(p, CAP_SYS_ADMIN)) > points -= (points * 3) / 100; > + task_unlock(p); > > /* Normalize to oom_score_adj units */ > adj *= totalpages / 1000; -- Michal Hocko SUSE Labs