On Sat, Dec 31, 2022 at 4:49 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > On Sat, Dec 31, 2022 at 11:46 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: [...] > > Hmmm... Some of the tasks run at relatively high priority. Maybe they > > need to de-prioritize themselves before looping waiting to be stopped. > > These loops look like this: > > > > while (!kthread_should_stop()) { > > torture_shutdown_absorb("rcu_torture_boost"); > > schedule_timeout_uninterruptible(1); > > } > > Yes, it appears this tight loop is live locked with the timer softirq. > I am trying a run with higher timeout to see if it helps. > > > > > Or it might be something else... > > I see that kthread_should_stop() returns false, but > torture_must_stop_irq() returns true in the tight while loop mentioned > above. So it seems like the shutdown notifier triggered first. I am > seeing various "is stopping" messages. However I see no "End-test" > messages, which means I think the torture_shutdown_hook() never ran > properly, or something. Anyway now I am doing heavy tracing in > rcu_torture_cleanup() to see what it is upto. My suspicion is it did > not even call torture_stop_kthread() and we are stuck without the > kthreads being stopped. Now all tests pass always if I do the following change in torture_stopping(): - schedule_timeout_uninterruptible(1); + schedule_timeout_uninterruptible(50); Current theory is, the timer softirq preempts the cleanup thread before it can call kthread_stop(). Anyway, let me know if this is an acceptable change (or not). I think checking for shutdown state 20 times per seconds instead of 1000 times per second is kind of reasonable. thanks, - Joel