On Mon, Sep 1, 2014 at 1:38 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > On Thu, Aug 28, 2014 at 04:27:35PM -0700, Cong Wang wrote: >> From: Cong Wang <cwang@xxxxxxxxxxxxxxxx> >> >> We saw a kernel soft lockup in perf_remove_from_context(), >> it looks like the `perf` process, when exiting, could not go >> out of the retry loop. Meanwhile, the target process was forking >> a child. So either the target process should execute the smp >> function call to deactive the event (if it was running) or it should >> do a context switch which deactives the event. >> >> It seems we optimize out a context switch in perf_event_context_sched_out(), >> and what's more important, we still test an obsolete task pointer when >> retrying, so no one actually would deactive that event in this situation. >> Fix it directly by reloading the task pointer in perf_remove_from_context(). >> This should fix the above soft lockup. > > > >> --- >> diff --git a/kernel/events/core.c b/kernel/events/core.c >> index f9c1ed0..c4141a0 100644 >> --- a/kernel/events/core.c >> +++ b/kernel/events/core.c >> @@ -1524,6 +1524,11 @@ retry: > > Please use either: > > .gitconfig: > > [diff "default"] > xfuncname = "^[[:alpha:]$_].*[^:]$" > > .quiltrc: > > QUILT_DIFF_OPTS="-F ^[[:alpha:]\$_].*[^:]\$" > OK, I didn't know this before. >> */ >> if (ctx->is_active) { >> raw_spin_unlock_irq(&ctx->lock); >> + /* >> + * Reload the task pointer, it might have been changed by >> + * a concurrent perf_event_context_sched_out() without switching >> + */ >> + task = ctx->task; >> goto retry; >> } > > You forgot to check if that same error happened in other places (it > does), please fix all of them. I think you mean perf_install_in_context()? I only saw the soft lockup in perf_remove_from_context() so far, but I can fix other places if you want. Thanks! -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html