On 07/07/2010 04:57 AM, Thomas Gleixner wrote: > Cc'ing Darren. > > On Wed, 7 Jul 2010, Mike Galbraith wrote: > >> Greetings, >> >> Stress testing, looking to trigger RCU stalls, I've managed to find a >> way to repeatably create fireworks. (got RCU stall, see attached) <snip> embarrassing ltp realtime/perf/latency/pthread_cond_many breakage </snip> >> 4. run it. >> >> What happens here is we hit WARN_ON(pendowner->pi_blocked_on != waiter), >> this does not make it to consoles (poking sysrq-foo doesn't either). >> Next comes WARN_ON(!pendowner->pi_blocked_on), followed by the NULL >> explosion, which does make it to consoles. So the WARN_ON sequence is obviously wrong, if it's critical it should be a BUG(), if not we shouldn't dereference what we know to be null. The following patch avoids the NULL pointer dereference in the WARN_ON. With this patch the NULL WARN_ON makes it to the console, and test runs to completion with no obvious negative side effects. I'm only posting for reference at this point, as while this may be necessary, it isn't the right "solution". Some other data points from what time I could spend on this today. FC12 kernel (2.6.31 based) has Requeue PI support, but does not exhibit this behavior. 2.6.33.5-rt23 without CONFIG_PREEMPT_RT does NOT exhibit this behavior. 2.6.33.5-rt23 does exhibit this behavior. The minimal tracing I attempted (a handful of trace_printk's and run with the nop plugin) all prevented the crash from happening. There appears to be no correlation to pi_blocked_on being NULL and the next pointer being NULL (I saw a roughly equivalent mix of NULL and valid pointers for next when pi_blocked_on was NULL). Tonight/Tomorrow I'll review the rtmutex and futex code to try and fully understand (again) the usage of pi_blocked_on and if we need to avoid this scenario, or if we need to handle it "gracefully". >From fa6a6bee6e467d12d3774612c838703acd265ea6 Mon Sep 17 00:00:00 2001 From: Darren Hart <dvhltc@xxxxxxxxxx> Date: Thu, 8 Jul 2010 19:44:35 -0400 Subject: [PATCH] rtmutex: avoid warnon bug Signed-off-by: Darren Hart <dvhltc@xxxxxxxxxx> --- kernel/rtmutex.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c index 23dd443..a2fcaa5 100644 --- a/kernel/rtmutex.c +++ b/kernel/rtmutex.c @@ -579,9 +579,9 @@ static void wakeup_next_waiter(struct rt_mutex *lock, int savestate) raw_spin_lock(&pendowner->pi_lock); - WARN_ON(!pendowner->pi_blocked_on); WARN_ON(pendowner->pi_blocked_on != waiter); - WARN_ON(pendowner->pi_blocked_on->lock != lock); + if (!WARN_ON(!pendowner->pi_blocked_on)) + WARN_ON(pendowner->pi_blocked_on->lock != lock); pendowner->pi_blocked_on = NULL; -- 1.6.5.2 -- Darren Hart IBM Linux Technology Center Real-Time Linux Team -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html