On Fri, Sep 22, 2017 at 09:27:39PM -0400, Steven Rostedt wrote: > On Fri, 22 Sep 2017 15:54:55 -0700 > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote: > > > On Fri, Sep 22, 2017 at 06:15:47PM -0400, Steven Rostedt wrote: > > > From: "Steven Rostedt (VMware)" <rostedt@xxxxxxxxxxx> > > > > > > While debugging some RCU issues with the stack tracer, it was discovered > > > that the problem was much more than with the stack tracer itself, but with > > > the saving of the stack trace, which could happen from any WARN() as well. > > > The problem was fixed within kernel_text_address(). > > > > > > One of the bugs that was discovered was that the stack tracer called > > > rcu_enter_irq() unconditionally. Paul McKenney said that could cause issues > > > as well. Instead of adding logic to only call rcu_enter_irq() if RCU is not > > > watching from within the stack tracer, since the core issue has been fixed > > > (within save_stack_trace()), we can simply remove all the logic in the stack > > > tracer that deals with RCU work arounds. > > > > I must confess that I am having some difficulty parsing this paragraph, > > especially the last sentence... > > > > Does this capture it? > > > > One problem is that the stack tracer called rcu_irq_enter() > > unconditionally, which is problematic if RCU's last > > not-watching-to-watching transition was carried out by > > rcu_nmi_enter. In that case, rcu_irq_enter() actually switches > > I thought the rcu_irq_enter() after rcu_nmi_enter() was a separate bug. > Your original complaint was that I called rcu_irq_enter() > unconditionally, and wanted me to only call it if RCU wasn't watching. > > But, the new code could possibly have this get called after > rcu_nmi_enter() because we are calling it without in_nmi() being set. You are correct. The initial bug was rcu_irq_enter() being invoked recursively due to tracing of some of its called functions, which caused RCU to not have been watching when it should have. I then advised you to switch to rcu_nmi_enter(), unaware that there would be calls to rcu_irq_enter() between the rcu_nmi_enter() and its matching rcu_nmi_exit(). Which led us to the second problem, described in my suggested paragraph. > > RCU back to the not-watching state for this CPU, which results > > in lockdep splats complaining about rcu_read_lock() being > > used on an idle (not-watched) CPU. The first patch of this > > series addressed this problem by having rcu_irq_enter() and > > rcu_irq_exit() refrain from doing anything when rcu_nmi_enter() > > caused RCU to start watching this CPU. The third patch in this > > series caused save_stack_trace() to invoke rcu_nmi_enter() and > > rcu_nmi_exit() as needed, so this fourth patch now removes the > > rcu_irq_enter() and rcu_irq_exit() from within the stack tracer. > > > > One further question... Can I now remove the rcu_irq_enter_disabled() > > logic? > > After this goes in. Yes. But that doesn't need to be a stable change. Good, at least a little simplification out of this. ;-) > > > Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> > > > Cc: stable@xxxxxxxxxxxxxxx > > > Fixes: 0be964be0 ("module: Sanitize RCU usage and locking") > > > Suggested-by: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> > > > Signed-off-by: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx> > > > > With the hard-to-parse paragraph fixed: > > > > Acked-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> > > Thanks, but we may need to go back and forth a bit to get the change > log correct. > > Remember the first bug. The one that was fixed by changing > rcu_irq_enter() to rcu_nmi_enter()? > > ============================= > WARNING: suspicious RCU usage > 4.13.0-rc7-test+ #117 Tainted: G W > ----------------------------- > /work/git/linux-trace.git/arch/x86/kernel/traps.c:305 entry code didn't wake RCU! > > other info that might help us debug this: > > > RCU used illegally from idle CPU! > rcu_scheduler_active = 2, debug_locks = 1 > RCU used illegally from extended quiescent state! > no locks held by swapper/1/0. > > stack backtrace: > CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.13.0-rc7-test+ #117 > Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 > Call Trace: > dump_stack+0x86/0xcf > lockdep_rcu_suspicious+0xc5/0x100 > do_error_trap+0x125/0x130 > ? do_error_trap+0x5/0x130 > ? trace_hardirqs_off_thunk+0x1a/0x1c > ? do_invalid_op+0x5/0x30 > do_invalid_op+0x20/0x30 > invalid_op+0x1e/0x30 > RIP: 0010:module_assert_mutex_or_preempt+0x34/0x40 > RSP: 0018:ffffc900006abc58 EFLAGS: 00010046 > RAX: 0000000000000000 RBX: ffffffffa000a077 RCX: 0000000000000002 > RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000046 > RBP: ffffc900006abc58 R08: ffffc900006abf40 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > R13: 0000000000000000 R14: ffff8801188d8040 R15: ffffffff81ed5720 > ? 0xffffffffa000a077 > ? module_assert_mutex_or_preempt+0x30/0x40 > __module_address+0x2c/0xf0 > ? 0xffffffffa000a077 > __module_text_address+0x12/0x60 > ? 0xffffffffa000a077 > is_module_text_address+0x1f/0x50 > ? 0xffffffffa000a077 > __kernel_text_address+0x30/0x90 > unwind_get_return_address+0x1f/0x30 > __save_stack_trace+0x83/0xd0 > ? 0xffffffffa000a077 > ? rcu_dynticks_eqs_exit+0x5/0x40 > save_stack_trace+0x1b/0x20 > check_stack+0xf8/0x2f0 > ? rcu_dynticks_eqs_enter+0x30/0x30 > stack_trace_call+0x6e/0x80 > 0xffffffffa000a077 > ? ftrace_graph_caller+0x78/0xa8 > ? rcu_dynticks_eqs_exit+0x5/0x40 > rcu_dynticks_eqs_exit+0x5/0x40 > rcu_idle_exit+0xdf/0xf0 > ? rcu_dynticks_eqs_exit+0x5/0x40 > ? rcu_idle_exit+0xdf/0xf0 > do_idle+0x128/0x200 > cpu_startup_entry+0x1d/0x20 > start_secondary+0x108/0x130 > secondary_startup_64+0x9f/0x9f > > This was caused by just using rcu_irq_enter(). Not sure if this will > still be an issue or not. But because we now add an rcu_nmi_enter() > without being in_nmi(), we probably should do this. This code doesn't > run if in_nmi() is true, but it could run from the stack trace dump > itself, and that now calls rcu_nmi_enter(). OK, how about the following? It turns out that functions called from rcu_irq_enter() can be subject to various kinds of tracing, which can result in rcu_irq_enter() being invoked recursively. This recursion causes RCU to not have been watching when it should have, resulting in lockdep-RCU splats. Switching from rcu_irq_enter() to rcu_nmi_enter() still resulted in failures because of calls to rcu_irq_enter() between the rcu_nmi_enter() and its matching rcu_nmi_exit(). Such calls again cause RCU to not be watching when it should have been. In particular, the stack tracer called rcu_irq_enter() unconditionally, which is problematic when RCU's last not-watching-to-watching transition was carried out by rcu_nmi_enter(), as will be the case when tracing uses rcu_nmi_enter() to cause RCU to start watching the current CPU. In that case, rcu_irq_enter() actually switches RCU back to the not-watching state for this CPU, which results in lockdep splats complaining about rcu_read_lock() being used on an idle (not-watched) CPU. The first patch of this series addressed this problem by having rcu_irq_enter() and rcu_irq_exit() refrain from doing anything when rcu_nmi_enter() caused RCU to start watching this CPU. The third patch in this series caused save_stack_trace() to invoke rcu_nmi_enter() and rcu_nmi_exit() as needed, so this fourth patch now removes the rcu_irq_enter() and rcu_irq_exit() from within the stack tracer. > Actually, thinking about this more, this doesn't need to go in stable. > As recursive rcu_irq_enter() calls should not hurt, and you now allow > rcu_irq_enter() to be called even after a rcu_nmi_enter() right? Yes, it is now the case that rcu_irq_enter() can be called even after an rcu_nmi_enter() exited idle, because rcu_irq_enter() now checks for this and takes an early exit if so. But what is it about older kernels prevents the tracing-induced recursive calls to rcu_irq_enter()? Thanx, Paul