On Sat, Dec 31, 2022 at 06:10:40PM -0500, Joel Fernandes wrote: > On Sat, Dec 31, 2022 at 4:49 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > On Sat, Dec 31, 2022 at 11:46 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > [...] > > > Hmmm... Some of the tasks run at relatively high priority. Maybe they > > > need to de-prioritize themselves before looping waiting to be stopped. > > > These loops look like this: > > > > > > while (!kthread_should_stop()) { > > > torture_shutdown_absorb("rcu_torture_boost"); > > > schedule_timeout_uninterruptible(1); > > > } > > > > Yes, it appears this tight loop is live locked with the timer softirq. > > I am trying a run with higher timeout to see if it helps. > > > > > > > > Or it might be something else... > > > > I see that kthread_should_stop() returns false, but > > torture_must_stop_irq() returns true in the tight while loop mentioned > > above. So it seems like the shutdown notifier triggered first. I am > > seeing various "is stopping" messages. However I see no "End-test" > > messages, which means I think the torture_shutdown_hook() never ran > > properly, or something. Anyway now I am doing heavy tracing in > > rcu_torture_cleanup() to see what it is upto. My suspicion is it did > > not even call torture_stop_kthread() and we are stuck without the > > kthreads being stopped. > > Now all tests pass always if I do the following change in torture_stopping(): > > - schedule_timeout_uninterruptible(1); > + schedule_timeout_uninterruptible(50); > > Current theory is, the timer softirq preempts the cleanup thread > before it can call kthread_stop(). > > Anyway, let me know if this is an acceptable change (or not). I think > checking for shutdown state 20 times per seconds instead of 1000 times > per second is kind of reasonable. Make that schedule_timeout_uninterruptible(HZ / 20) and sold! Thanx, Paul