On Fri, Dec 30, 2022 at 08:42:24PM -0500, Joel Fernandes wrote: > Hello, > > I have been firefighting a hang on 6.0.y stable kernels with > rcutorture. It happens mostly consistently when TREE07 is shutting > down. > > It appears that the RCU torture threads are attempted to stop but the > shutdown thread, but are constantly awakened by a timer softirq > handler in ksoftirqd context. When they wake up, they immediately goto > sleep in uninterruptible state until the next time a timer handler > wakes them up. It appears the timer softirq is long enough to cause > RCU stalls and I see it calling 100s of timer function handlers > (call_timer_fn). > > I am doing some more investigation with trace_printk(s): > https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=stable/trace-hang-6.0.y&id=b779b1e92c97f29333a282ee8f548da02f64de2b > > Regarding the timer handlers, I was wondering if it is possible that a > large number of timer handlers constantly queued can cause RCU stalls > due to the timer softirq taking a very long time. That certainly > appears to be the case here. Shouldn't the timer softirq also do > rcu_softirq_qs() similar to the ksoftirq loop, in case there are too > many of them? > > Here is a full log with trace dump if you anyone wants to take a look: > http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.0.y/11/artifact/tools/testing/selftests/rcutorture/res/2022.12.30-22.57.13/TREE07.2/console.log > And the res directory: > http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.0.y/11/artifact/tools/testing/selftests/rcutorture/res/2022.12.30-22.57.13/TREE07.2/ > > Any thoughts on any patches 6.0 might be missing? I can't reproduce it in v6.0 (vanilla not stable) after 100 runs of 5 minutes, so may be it's actually some patches too many instead :-) > > Meanwhile, debug here continues... thanks, > > - Joel