On Fri, Dec 30, 2022 at 9:14 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Fri, Dec 30, 2022 at 08:42:24PM -0500, Joel Fernandes wrote: > > Hello, > > > > I have been firefighting a hang on 6.0.y stable kernels with > > rcutorture. It happens mostly consistently when TREE07 is shutting > > down. > > > > It appears that the RCU torture threads are attempted to stop but the > > shutdown thread, but are constantly awakened by a timer softirq > > handler in ksoftirqd context. When they wake up, they immediately goto > > sleep in uninterruptible state until the next time a timer handler > > wakes them up. It appears the timer softirq is long enough to cause > > RCU stalls and I see it calling 100s of timer function handlers > > (call_timer_fn). > > > > I am doing some more investigation with trace_printk(s): > > https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=stable/trace-hang-6.0.y&id=b779b1e92c97f29333a282ee8f548da02f64de2b > > > > Regarding the timer handlers, I was wondering if it is possible that a > > large number of timer handlers constantly queued can cause RCU stalls > > due to the timer softirq taking a very long time. That certainly > > appears to be the case here. Shouldn't the timer softirq also do > > rcu_softirq_qs() similar to the ksoftirq loop, in case there are too > > many of them? > > That is certainly a good thing to try! I am trying something like this just for testing, let's see what happens ;-) @@ -1788,9 +1796,14 @@ static inline void __run_timers(struct timer_base *base) while (levels--) expire_timers(base, heads + levels); + + rcu_softirq_qs(); } I guess I am also wondering why the rcu reader does not stop queuing timers. It is doing schedule_timeout_interruptible() constantly even though the test is being stopped. thanks, - Joel