On Sat, Sep 23, 2017 at 07:22:04AM -0400, Steven Rostedt wrote: > On Fri, 22 Sep 2017 23:07:37 -0700 > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote: > > > OK, how about the following? > > > > It turns out that functions called from rcu_irq_enter() can > > be subject to various kinds of tracing, which can result in > > rcu_irq_enter() being invoked recursively. This recursion > > causes RCU to not have been watching when it should have, > > resulting in lockdep-RCU splats. Switching from rcu_irq_enter() > > to rcu_nmi_enter() still resulted in failures because of calls > > to rcu_irq_enter() between the rcu_nmi_enter() and its matching > > rcu_nmi_exit(). Such calls again cause RCU to not be watching > > when it should have been. > > > > In particular, the stack tracer called rcu_irq_enter() > > unconditionally, which is problematic when RCU's last > > not-watching-to-watching transition was carried out by > > rcu_nmi_enter(), as will be the case when tracing uses > > rcu_nmi_enter() to cause RCU to start watching the current CPU. > > In that case, rcu_irq_enter() actually switches RCU back to > > the not-watching state for this CPU, which results in lockdep > > splats complaining about rcu_read_lock() being used on an idle > > (not-watched) CPU. The first patch of this series addressed > > this problem by having rcu_irq_enter() and rcu_irq_exit() > > refrain from doing anything when rcu_nmi_enter() caused RCU to > > start watching this CPU. The third patch in this series caused > > save_stack_trace() to invoke rcu_nmi_enter() and rcu_nmi_exit() > > as needed, so this fourth patch now removes the rcu_irq_enter() > > and rcu_irq_exit() from within the stack tracer. > > I think it's a bit too much ;-) I believe it talks too much about the > RCU internals, and the bug will be lost on us mere mortals. > > > > > > Actually, thinking about this more, this doesn't need to go in stable. > > > As recursive rcu_irq_enter() calls should not hurt, and you now allow > > > rcu_irq_enter() to be called even after a rcu_nmi_enter() right? > > > > Yes, it is now the case that rcu_irq_enter() can be called even after > > an rcu_nmi_enter() exited idle, because rcu_irq_enter() now checks for > > this and takes an early exit if so. > > > > But what is it about older kernels prevents the tracing-induced recursive > > calls to rcu_irq_enter()? > > Ah, the bug is if rcu_irq_enter() is called, and the stack trace > triggers then. OK, lets keep it simple, and just say this. > > > Currently the stack tracer calls rcu_irq_enter() to make sure RCU > is watching when it records a stack trace. But if the stack tracer > is triggered while tracing inside of a rcu_irq_enter(), calling > rcu_irq_enter() unconditionally can be problematic. > > The reason for having rcu_irq_enter() in the first place has been > fixed from within the saving of the stack trace code, and there's no > reason for doing it in the stack tracer itself. Just remove it. > > Simple and to the point. Works for me! Thanx, Paul