On Thu, Apr 22, 2021 at 04:45:00PM +0100, John Garry wrote: > On 22/04/2021 15:35, Paul E. McKenney wrote: > > On Thu, Apr 22, 2021 at 10:20:51AM +0100, John Garry wrote: > > > Hi RCU experts, > > > > > Thanks Paul > > > > Recently I have noticed that I can trigger an RCU stall quite easily on my > > > system under specific conditions. > > > > > > I have a fair idea why it happens, but need to analyze a proper solution > > > further. It looks like a hard IRQ handler and threaded part are tied to > > > specific CPU and getting swamped and not relinquishing. I should hasten to confirm that saturating a CPU with interrupts can also result in RCU CPU stall warnings, so please do continue your efforts fixing this as well. > > > However, mixed in the RCU splats, I have noticed many BUG logs, like: > > > > > > [ 207.788748] BUG: spinlock recursion on CPU#46, fio/1470 > > This is a self-deadlock. Given that deadlock, and given that spinlocks > > disable preemption, the RCU CPU stall warnings are expected behavior. > > After all, your code really is grabbing a CPU by the throat and shaking > > it indefinitely. > > > > Please build your kernel with CONFIG_PROVE_LOCKING=y and then fix the > > issues it calls out. Then please also fix the bugs resulting in the > > "sleeping function called from invalid context" and in the "scheduling > > while atomic". > > Here's the rub, the issue goes away with CONFIG_PROVE_LOCKING and all the > extra debugging it adds. Hmmm. That can happpen. You have enough going on that fixing what you already know about might eventually get things to where CONFIG_PROVE_LOCKING does something useful to you. Thanx, Paul > But I get the point that these are separate and need to be fixed also. > > > > > In addition, there are quite a few idle tasks called out in your list of > > stalled CPUs. This is often due to RCU's grace-period kthread (named > > "rcu_preempt" in this case) not getting any CPU time. This is not > > unexpected given the "RT throttling activated". If you are going to run > > code at real-time priorities, you must ensure that any number of kernel > > kthreads get the CPU time that they need. As Spiderman's uncle said > > "With great power comes great responsibility". > > OK, I need to check on that separately also. > > Cheers, > John