* tip-bot for Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > Commit-ID: dbe08d82ce3967ccdf459f7951d02589cf967300 > Gitweb: http://git.kernel.org/tip/dbe08d82ce3967ccdf459f7951d02589cf967300 > Author: Oleg Nesterov <oleg@xxxxxxxxxx> > AuthorDate: Wed, 19 Jan 2011 19:22:07 +0100 > Committer: Ingo Molnar <mingo@xxxxxxx> > CommitDate: Wed, 19 Jan 2011 20:04:27 +0100 > > perf: Fix find_get_context() vs perf_event_exit_task() race > > find_get_context() must not install the new perf_event_context > if the task has already passed perf_event_exit_task(). > > If nothing else, this means the memory leak. Initially > ctx->refcount == 2, it is supposed that > perf_event_exit_task_context() should participate and do the > necessary put_ctx(). > > find_lively_task_by_vpid() checks PF_EXITING but this buys > nothing, by the time we call find_get_context() this task can be > already dead. To the point, cmpxchg() can succeed when the task > has already done the last schedule(). > > Change find_get_context() to populate task->perf_event_ctxp[] > under task->perf_event_mutex, this way we can trust PF_EXITING > because perf_event_exit_task() takes the same mutex. > > Also, change perf_event_exit_task_context() to use > rcu_dereference(). Probably this is not strictly needed, but > with or without this change find_get_context() can race with > setup_new_exec()->perf_event_exit_task(), rcu_dereference() > looks better. > > Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> > Acked-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> > Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> > Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> > Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx> > Cc: Paul Mackerras <paulus@xxxxxxxxx> > Cc: Prasad <prasad@xxxxxxxxxxxxxxxxxx> > Cc: Roland McGrath <roland@xxxxxxxxxx> > LKML-Reference: <20110119182207.GB12183@xxxxxxxxxx> > Signed-off-by: Ingo Molnar <mingo@xxxxxxx> > --- > kernel/perf_event.c | 34 ++++++++++++++++++++-------------- > 1 files changed, 20 insertions(+), 14 deletions(-) > > diff --git a/kernel/perf_event.c b/kernel/perf_event.c > index 84522c7..4ec55ef 100644 > --- a/kernel/perf_event.c > +++ b/kernel/perf_event.c > @@ -2201,13 +2201,6 @@ find_lively_task_by_vpid(pid_t vpid) > if (!task) > return ERR_PTR(-ESRCH); > > - /* > - * Can't attach events to a dying task. > - */ > - err = -ESRCH; > - if (task->flags & PF_EXITING) > - goto errout; > - > /* Reuse ptrace permission checks for now. */ > err = -EACCES; > if (!ptrace_may_access(task, PTRACE_MODE_READ)) > @@ -2268,14 +2261,27 @@ retry: > > get_ctx(ctx); > > - if (cmpxchg(&task->perf_event_ctxp[ctxn], NULL, ctx)) { > - /* > - * We raced with some other task; use > - * the context they set. > - */ > + err = 0; > + mutex_lock(&task->perf_event_mutex); > + /* > + * If it has already passed perf_event_exit_task(). > + * we must see PF_EXITING, it takes this mutex too. > + */ > + if (task->flags & PF_EXITING) > + err = -ESRCH; > + else if (task->perf_event_ctxp[ctxn]) > + err = -EAGAIN; > + else > + rcu_assign_pointer(task->perf_event_ctxp[ctxn], ctx); > + mutex_unlock(&task->perf_event_mutex); > + > + if (unlikely(err)) { > put_task_struct(task); > kfree(ctx); > - goto retry; > + > + if (err == -EAGAIN) > + goto retry; > + goto errout; > } > } > > @@ -6127,7 +6133,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn) > * scheduled, so we are now safe from rescheduling changing > * our context. > */ > - child_ctx = child->perf_event_ctxp[ctxn]; > + child_ctx = rcu_dereference(child->perf_event_ctxp[ctxn]); > task_ctx_sched_out(child_ctx, EVENT_ALL); > > /* hm, this one's causing: [ 25.557579] =================================================== [ 25.561361] [ INFO: suspicious rcu_dereference_check() usage. ] [ 25.561361] --------------------------------------------------- [ 25.561361] kernel/perf_event.c:6136 invoked rcu_dereference_check() without protection! [ 25.561361] [ 25.561361] other info that might help us debug this: [ 25.561361] [ 25.561361] [ 25.561361] rcu_scheduler_active = 1, debug_locks = 0 [ 25.561361] no locks held by true/1397. [ 25.561361] [ 25.561361] stack backtrace: [ 25.561361] Pid: 1397, comm: true Not tainted 2.6.38-rc1-tip+ #86752 [ 25.561361] Call Trace: [ 25.561361] [<ffffffff8106cd98>] ? lockdep_rcu_dereference+0xaa/0xb3 [ 25.561361] [<ffffffff810b34ee>] ? perf_event_exit_task+0x118/0x22a [ 25.561361] [<ffffffff811133b8>] ? free_fs_struct+0x44/0x48 [ 25.561361] [<ffffffff810434ef>] ? do_exit+0x2c8/0x770 [ 25.561361] [<ffffffff813a52ed>] ? retint_swapgs+0xe/0x13 [ 25.561361] [<ffffffff81043c3c>] ? do_group_exit+0x82/0xad [ 25.561361] [<ffffffff81043c7e>] ? sys_exit_group+0x17/0x1b [ 25.561361] [<ffffffff81002acb>] ? system_call_fastpath+0x16/0x1b Any ideas? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html