On Tue, 2010-04-06 at 13:51 +0200, Peter Zijlstra wrote: > On Tue, 2010-04-06 at 13:38 +0200, Frederic Weisbecker wrote: > > On Tue, Apr 06, 2010 at 02:50:49AM -0700, David Miller wrote: > > > From: Frederic Weisbecker <fweisbec@xxxxxxxxx> > > > Date: Mon, 5 Apr 2010 21:40:58 +0200 > > > > > > > It happens without CONFIG_FUNCTION_TRACER as well (but it happens > > > > when the function tracer runs). And I hadn't your > > > > perf_arch_save_caller_regs() when I triggered this. > > > > > > I figured out the problem, it's NMIs. As soon as I disable all of the > > > NMI watchdog code, the problem goes away. > > > > > > This is because some parts of the NMI interrupt handling path are not > > > marked with "notrace" and the various tracer code paths use > > > local_irq_disable() (either directly or indirectly) which doesn't work > > > with sparc64's NMI scheme. These essentially turn NMIs back on in the > > > NMI handler before the NMI condition has been cleared, and thus we can > > > re-enter with another NMI interrupt. > > > > > > We went through this for perf events, and we just made sure that > > > local_irq_{enable,disable}() never occurs in any of the code paths in > > > perf events that can be reached via the NMI interrupt handler. (the > > > only one we had was sched_clock() and that was easily fixed) > > > > > > > > That reminds me we have a new pair of local_irq_disable/enable > > in perf_event_task_output(), which path can be taken by hardware > > pmu events. > > > > See this patch: > > > > 8bb39f9aa068262732fe44b965d7a6eb5a5a7d67 > > perf: Fix 'perf sched record' deadlock > > ARGH.. yes > > Also, I guess that should live in perf_output_lock/unlock() not in > perf_event_task_output(). > > Egads, how to fix that Damn, so deadlock fix isn't a fix. No idea. -Mike -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html