Re: RCU stalls with TREE07 on v6.0 kernel

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Sat, 31 Dec 2022 09:31:45 -0800

On Sat, Dec 31, 2022 at 12:14:37PM -0500, Joel Fernandes wrote:
> 
> 
> > On Dec 31, 2022, at 11:52 AM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > 
> > On Sat, Dec 31, 2022 at 04:18:40PM +0000, Joel Fernandes wrote:
> >>> On Sat, Dec 31, 2022 at 4:16 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> >>> 
> >>> On Sat, Dec 31, 2022 at 4:07 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >>>> 
> >>>> On Sat, Dec 31, 2022 at 12:28:10AM -0500, Joel Fernandes wrote:
> >>>>> On Fri, Dec 30, 2022 at 9:14 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >>>>>> 
> >>>>>> On Fri, Dec 30, 2022 at 08:42:24PM -0500, Joel Fernandes wrote:
> >>>>>>> Hello,
> >>>>>>> 
> >>>>>>> I have been firefighting a hang on 6.0.y stable kernels with
> >>>>>>> rcutorture. It happens mostly consistently when TREE07 is shutting
> >>>>>>> down.
> >>>>>>> 
> >>>>>>> It appears that the RCU torture threads are attempted to stop but the
> >>>>>>> shutdown thread, but are constantly awakened by a timer softirq
> >>>>>>> handler in ksoftirqd context. When they wake up, they immediately goto
> >>>>>>> sleep in uninterruptible state until the next time a timer handler
> >>>>>>> wakes them up. It appears the timer softirq is long enough to cause
> >>>>>>> RCU stalls and I see it calling 100s of timer function handlers
> >>>>>>> (call_timer_fn).
> >>>>>>> 
> >>>>>>> I am doing some more investigation with trace_printk(s):
> >>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=stable/trace-hang-6.0.y&id=b779b1e92c97f29333a282ee8f548da02f64de2b
> >>>>>>> 
> >>>>>>> Regarding the timer handlers, I was wondering if it is possible that a
> >>>>>>> large number of timer handlers constantly queued can cause RCU stalls
> >>>>>>> due to the timer softirq taking a very long time. That certainly
> >>>>>>> appears to be the case here. Shouldn't the timer softirq also do
> >>>>>>> rcu_softirq_qs() similar to the ksoftirq loop, in case there are too
> >>>>>>> many of them?
> >>>>>> 
> >>>>>> That is certainly a good thing to try!
> >>>>> 
> >>>>> I am trying something like this just for testing, let's see what happens ;-)
> >>>>> 
> >>>>> @@ -1788,9 +1796,14 @@ static inline void __run_timers(struct timer_base *base)
> >>>>> 
> >>>>>                while (levels--)
> >>>>>                        expire_timers(base, heads + levels);
> >>>> 
> >>>> Might you need to put it inside the body of this "while" loop?
> >>> 
> >>> I could but even doing the below did not help. Do you see any reason
> >>> putting inside the loop helps? Let me know.
> >>> 
> >>>>> +               rcu_softirq_qs();
> >>>>>        }
> >>>>> 
> >>>>> I guess I am also wondering why the rcu reader does not stop queuing
> >>>>> timers. It is doing schedule_timeout_interruptible() constantly even
> >>>>> though the test is being stopped.
> >>>> 
> >>>> As in it is doing schedule_timeout_interruptible() in a loop that does
> >>>> not check torture_must_stop()?  That would be bad unless there is some
> >>>> reason the loop cannot just stop.  But if the loop cannot just stop,
> >>>> that leads me to believe that there is a bug elsewhere.  After all, given
> >>>> that the test is ending, why shouldn't all the loops be able to just stop?
> >>> 
> >>> Sorry I meant uninterruptible sleep. So
> >>> schedule_timeout_uninterruptible(). But I still cannot nail where in
> >>> the torture test it is coming from. Now I put a trace_dump_stack()
> >>> directly in the schedule loop() and I am hoping it will show:
> >>> 
> >>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >>> index c2bb0cd4225d5..d79bc3b91818a 100644
> >>> --- a/kernel/sched/core.c
> >>> +++ b/kernel/sched/core.c
> >>> @@ -6390,6 +6390,8 @@ pick_next_task(struct rq *rq, struct task_struct
> >>> *prev, struct rq_flags *rf)
> >>> *
> >>> * WARNING: must be called with preemption disabled!
> >>> */
> >>> +static int rcu_tort_ds;
> >>> +
> >>> static void __sched notrace __schedule(unsigned int sched_mode)
> >>> {
> >>> struct task_struct *prev, *next;
> >>> @@ -6452,6 +6454,11 @@ static void __sched notrace __schedule(unsigned
> >>> int sched_mode)
> >>> if (prev->sched_contributes_to_load)
> >>> rq->nr_uninterruptible++;
> >>> + // Dump stack for uninterruptible every 10 stacks.
> >>> +   if (prev_state & TASK_UNINTERRUPTIBLE && ((rcu_tort_ds++ % 10) == 0)) {
> >>> +       trace_dump_stack(0);
> >>> + }
> >>> +
> >>> /*
> >> 
> >> Ok found the uninterruptible sleep, it is coming from here:
> >> 
> >> [ 1359.592979] rcu_tort-160      15d.... 346177086us : <stack trace>
> >> [ 1359.592979]  => __ftrace_trace_stack
> >> [ 1359.592979]  => __schedule
> >> [ 1359.592979]  => schedule
> >> [ 1359.592979]  => schedule_timeout
> >> [ 1359.592979]  => torture_kthread_stopping
> >> [ 1359.592979]  => rcu_torture_reader
> >> [ 1359.592979]  => kthread
> >> [ 1359.592979]  => ret_from_fork
> >> 
> >> Thoughts?
> > 
> > That is door #3, or a variation on it.  That rcu_torture_reader() task
> > is waiting to be stopped in torture_kthread_stopping().  The next question
> > is "why isn't rcu_torture_cleanup() making it to the 'if (reader_tasks)'
> > code that stops the readers?"
> > 
> > Getting a stack trace of the task executing rcu_torture_cleanup() would
> > likely be helpful.  Or just check for the console-log messages that it
> > already emits.  These look like "Stopping rcu_torture_writer task".
> 
> Thanks for the suggestions, will look into them. I will focus the debug on the kthread stopping part. I think we might have to check for signal flags if we are not already, I found another patch that solved similar issues doing so. And in that patch also it was tight loop around the schedule timeout calls.
> 
> Happy new year, over here probably going to take it easy rest of the day.. but this issue is making me curious ;-)

Thank you, and Happy New Year to you and yours as well!  And on the
taking it easy, sounds like a plan!!!

							Thanx, Paul