On Sat, Dec 31, 2022 at 12:14:37PM -0500, Joel Fernandes wrote: > > > > On Dec 31, 2022, at 11:52 AM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > On Sat, Dec 31, 2022 at 04:18:40PM +0000, Joel Fernandes wrote: > >>> On Sat, Dec 31, 2022 at 4:16 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > >>> > >>> On Sat, Dec 31, 2022 at 4:07 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > >>>> > >>>> On Sat, Dec 31, 2022 at 12:28:10AM -0500, Joel Fernandes wrote: > >>>>> On Fri, Dec 30, 2022 at 9:14 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > >>>>>> > >>>>>> On Fri, Dec 30, 2022 at 08:42:24PM -0500, Joel Fernandes wrote: > >>>>>>> Hello, > >>>>>>> > >>>>>>> I have been firefighting a hang on 6.0.y stable kernels with > >>>>>>> rcutorture. It happens mostly consistently when TREE07 is shutting > >>>>>>> down. > >>>>>>> > >>>>>>> It appears that the RCU torture threads are attempted to stop but the > >>>>>>> shutdown thread, but are constantly awakened by a timer softirq > >>>>>>> handler in ksoftirqd context. When they wake up, they immediately goto > >>>>>>> sleep in uninterruptible state until the next time a timer handler > >>>>>>> wakes them up. It appears the timer softirq is long enough to cause > >>>>>>> RCU stalls and I see it calling 100s of timer function handlers > >>>>>>> (call_timer_fn). > >>>>>>> > >>>>>>> I am doing some more investigation with trace_printk(s): > >>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=stable/trace-hang-6.0.y&id=b779b1e92c97f29333a282ee8f548da02f64de2b > >>>>>>> > >>>>>>> Regarding the timer handlers, I was wondering if it is possible that a > >>>>>>> large number of timer handlers constantly queued can cause RCU stalls > >>>>>>> due to the timer softirq taking a very long time. That certainly > >>>>>>> appears to be the case here. Shouldn't the timer softirq also do > >>>>>>> rcu_softirq_qs() similar to the ksoftirq loop, in case there are too > >>>>>>> many of them? > >>>>>> > >>>>>> That is certainly a good thing to try! > >>>>> > >>>>> I am trying something like this just for testing, let's see what happens ;-) > >>>>> > >>>>> @@ -1788,9 +1796,14 @@ static inline void __run_timers(struct timer_base *base) > >>>>> > >>>>> while (levels--) > >>>>> expire_timers(base, heads + levels); > >>>> > >>>> Might you need to put it inside the body of this "while" loop? > >>> > >>> I could but even doing the below did not help. Do you see any reason > >>> putting inside the loop helps? Let me know. > >>> > >>>>> + rcu_softirq_qs(); > >>>>> } > >>>>> > >>>>> I guess I am also wondering why the rcu reader does not stop queuing > >>>>> timers. It is doing schedule_timeout_interruptible() constantly even > >>>>> though the test is being stopped. > >>>> > >>>> As in it is doing schedule_timeout_interruptible() in a loop that does > >>>> not check torture_must_stop()? That would be bad unless there is some > >>>> reason the loop cannot just stop. But if the loop cannot just stop, > >>>> that leads me to believe that there is a bug elsewhere. After all, given > >>>> that the test is ending, why shouldn't all the loops be able to just stop? > >>> > >>> Sorry I meant uninterruptible sleep. So > >>> schedule_timeout_uninterruptible(). But I still cannot nail where in > >>> the torture test it is coming from. Now I put a trace_dump_stack() > >>> directly in the schedule loop() and I am hoping it will show: > >>> > >>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c > >>> index c2bb0cd4225d5..d79bc3b91818a 100644 > >>> --- a/kernel/sched/core.c > >>> +++ b/kernel/sched/core.c > >>> @@ -6390,6 +6390,8 @@ pick_next_task(struct rq *rq, struct task_struct > >>> *prev, struct rq_flags *rf) > >>> * > >>> * WARNING: must be called with preemption disabled! > >>> */ > >>> +static int rcu_tort_ds; > >>> + > >>> static void __sched notrace __schedule(unsigned int sched_mode) > >>> { > >>> struct task_struct *prev, *next; > >>> @@ -6452,6 +6454,11 @@ static void __sched notrace __schedule(unsigned > >>> int sched_mode) > >>> if (prev->sched_contributes_to_load) > >>> rq->nr_uninterruptible++; > >>> + // Dump stack for uninterruptible every 10 stacks. > >>> + if (prev_state & TASK_UNINTERRUPTIBLE && ((rcu_tort_ds++ % 10) == 0)) { > >>> + trace_dump_stack(0); > >>> + } > >>> + > >>> /* > >> > >> Ok found the uninterruptible sleep, it is coming from here: > >> > >> [ 1359.592979] rcu_tort-160 15d.... 346177086us : <stack trace> > >> [ 1359.592979] => __ftrace_trace_stack > >> [ 1359.592979] => __schedule > >> [ 1359.592979] => schedule > >> [ 1359.592979] => schedule_timeout > >> [ 1359.592979] => torture_kthread_stopping > >> [ 1359.592979] => rcu_torture_reader > >> [ 1359.592979] => kthread > >> [ 1359.592979] => ret_from_fork > >> > >> Thoughts? > > > > That is door #3, or a variation on it. That rcu_torture_reader() task > > is waiting to be stopped in torture_kthread_stopping(). The next question > > is "why isn't rcu_torture_cleanup() making it to the 'if (reader_tasks)' > > code that stops the readers?" > > > > Getting a stack trace of the task executing rcu_torture_cleanup() would > > likely be helpful. Or just check for the console-log messages that it > > already emits. These look like "Stopping rcu_torture_writer task". > > Thanks for the suggestions, will look into them. I will focus the debug on the kthread stopping part. I think we might have to check for signal flags if we are not already, I found another patch that solved similar issues doing so. And in that patch also it was tight loop around the schedule timeout calls. > > Happy new year, over here probably going to take it easy rest of the day.. but this issue is making me curious ;-) Thank you, and Happy New Year to you and yours as well! And on the taking it easy, sounds like a plan!!! Thanx, Paul