Re: [REPOST] [PATCH 2/2] mm,oom: Reverse the order of setting TIF_MEMDIE and sending SIGKILL.

Michal Hocko <mhocko@xxxxxxxxxx> · Tue, 25 Aug 2015 16:17:35 +0200

On Tue 25-08-15 21:06:36, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 5249e7e..c0a5a69 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -555,12 +555,17 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> > >  	/* Get a reference to safely compare mm after task_unlock(victim) */
> > >  	mm = victim->mm;
> > >  	atomic_inc(&mm->mm_users);
> > > -	mark_oom_victim(victim);
> > >  	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
> > >  		task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
> > >  		K(get_mm_counter(victim->mm, MM_ANONPAGES)),
> > >  		K(get_mm_counter(victim->mm, MM_FILEPAGES)));
> > >  	task_unlock(victim);
> > > +	/* Send SIGKILL before setting TIF_MEMDIE. */
> > > +	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
> > > +	task_lock(victim);
> > > +	if (victim->mm)
> > > +		mark_oom_victim(victim);
> > > +	task_unlock(victim);
> > 
> > Why cannot you simply move do_send_sig_info without touching
> > mark_oom_victim? Are you still able to trigger the issue if you just
> > kill before crawling through all the tasks sharing the mm?
> 
> If you meant
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 1ecc0bc..ea578fb 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -560,6 +560,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
>                 K(get_mm_counter(victim->mm, MM_ANONPAGES)),
>                 K(get_mm_counter(victim->mm, MM_FILEPAGES)));
>         task_unlock(victim);
> +       do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
> 
>         /*
>          * Kill all user processes sharing victim->mm in other thread groups, if
> @@ -585,7 +586,6 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
>                 }
>         rcu_read_unlock();
> 
> -       do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
>         put_task_struct(victim);
>  }
>  #undef K
> 
> then yes I still can trigger the issue under very limited condition (i.e.
> ran as root user for polling kernel messages with realtime priority, after
> killing all processes using SysRq-i).

Yes, that's why I also said that preempt_{enable,disable} around could
be used:

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1ecc0bcaecc5..331c8ac23cc6 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -542,8 +542,15 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	}
 	read_unlock(&tasklist_lock);
 
+	/*
+	 * Make sure that nobody will preempt us between the victim gets access
+	 * to memory reserves and it gets killed. It could depleat the memory
+	 * reserves otherwise.
+	 */
+	preempt_disable();
 	p = find_lock_task_mm(victim);
 	if (!p) {
+		preempt_enable();
 		put_task_struct(victim);
 		return;
 	} else if (victim != p) {
@@ -560,6 +567,8 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 		K(get_mm_counter(victim->mm, MM_ANONPAGES)),
 		K(get_mm_counter(victim->mm, MM_FILEPAGES)));
 	task_unlock(victim);
+	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
+	preempt_enable();
 
 	/*
 	 * Kill all user processes sharing victim->mm in other thread groups, if
@@ -585,7 +594,6 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 		}
 	rcu_read_unlock();
 
-	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
 	put_task_struct(victim);
 }
 #undef K

[...]

> > The code would be easier then and the race window much smaller. If we
> > really needed to prevent from preemption then preempt_{enable,disable}
> > aournd the whole task_lock region + do_send_sig_info would be still
> > easier to follow than re-taking task_lock.
> 
> What's wrong with re-taking task_lock? It seems to me that re-taking
> task_lock is more straightforward and easier to follow.

I dunno it looks more awkward to me. You have to re-check the victim->mm
after retaking the lock because situation might have changed while the
lock was dropped. If the mark_oom_victim & do_send_sig_info are in the
same preempt region then nothing like that is needed. But this is
probably a matter of taste. I find the above more readable but let's see
what others think.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>