Re: [PATCH] mm,oom: use per signal_struct flag rather than clear TIF_MEMDIE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 29-06-16 02:13:53, Oleg Nesterov wrote:
> Michal,
> 
> I am already sleeping, I'll try to reply to other parts of your email
> (and other emails) tomorrow, just some notes about the patch you propose.

Thanks!

> And cough sorry for noise... I personally hate-hate-hate every new "oom"
> member you and Tetsuo add into task/signal_struct ;)

I am not really happy about that either. I wish I could find a better
way...

> But not in this case, because I _think_ we need signal_struct->mm
> anyway in the long term.
> 
> So at first glance this patch makes sense, but unless I missed something
> (the patch doesn't apply I can be easily wrong),

This is on top of the current mmotm tree which contains other oom
changes.

[...]
> > +void mark_oom_victim(struct task_struct *tsk, struct mm_struct *mm)
> >  {
> >  	WARN_ON(oom_killer_disabled);
> >  	/* OOM killer might race with memcg OOM */
> >  	if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
> >  		return;
> > +
> >  	atomic_inc(&tsk->signal->oom_victims);
> > +
> > +	/* oom_mm is bound to the signal struct life time */
> > +	if (!tsk->signal->oom_mm) {
> > +		atomic_inc(&mm->mm_count);
> > +		tsk->signal->oom_mm = mm;
> 
> Looks racy, but it is not because we rely on oom_lock? Perhaps a comment
> makes sense.

mark_oom_victim will be called only for the current or under the
task_lock so it should be stable. Except for...

> > @@ -838,8 +826,8 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> >  	 * If the task is already exiting, don't alarm the sysadmin or kill
> >  	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> >  	 */
> > -	if (task_will_free_mem(p)) {
> > -		mark_oom_victim(p);
> > +	if (mm && task_will_free_mem(p)) {
> > +		mark_oom_victim(p, mm);

This one. I didn't bother to cover it for the example patch but I have a
plan to address that. There are two possible ways. One is to pin
mm_count in oom_badness() so that we have a guarantee that it will not
get released from under us and the other one is to make
task_will_free_mem task_lock friendly and call this under the lock as we
used to.
 
> And this looks really racy at first glance. Suppose that this memory hog execs
> (this changes its ->mm) and then exits so that task_will_free_mem() == T, in
> this case "mm" has nothing to do with tsk->mm and it can be already freed.

Hmm, I didn't think about exec case. And I guess we have never cared
about that race. We just select a task and then kill it. The fact that
it is not sitting on the same memory anymore is silently ignored... But
I have to think about it more. I would be more worried about the use
after free.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]