Re: RCU stalls with TREE07 on v6.0 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 1, 2023 at 12:16 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Sat, Dec 31, 2022 at 06:10:40PM -0500, Joel Fernandes wrote:
> > On Sat, Dec 31, 2022 at 4:49 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Sat, Dec 31, 2022 at 11:46 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > [...]
> > > > Hmmm...  Some of the tasks run at relatively high priority.  Maybe they
> > > > need to de-prioritize themselves before looping waiting to be stopped.
> > > > These loops look like this:
> > > >
> > > >         while (!kthread_should_stop()) {
> > > >                 torture_shutdown_absorb("rcu_torture_boost");
> > > >                 schedule_timeout_uninterruptible(1);
> > > >         }
> > >
> > > Yes, it appears this tight loop is live locked with the timer softirq.
> > > I am trying a run with higher timeout to see if it helps.
> > >
> > > >
> > > > Or it might be something else...
> > >
> > > I see that kthread_should_stop() returns false, but
> > > torture_must_stop_irq() returns true in the tight while loop mentioned
> > > above. So it seems like the shutdown notifier triggered first. I am
> > > seeing various "is stopping" messages. However I see no "End-test"
> > > messages, which means I think the torture_shutdown_hook() never ran
> > > properly, or something. Anyway now I am doing heavy tracing in
> > > rcu_torture_cleanup() to see what it is upto. My suspicion is it did
> > > not even call torture_stop_kthread() and we are stuck without the
> > > kthreads being stopped.
> >
> > Now all tests pass always if I do the following change in torture_stopping():
> >
> > - schedule_timeout_uninterruptible(1);
> > + schedule_timeout_uninterruptible(50);
> >
> > Current theory is, the timer softirq preempts the cleanup thread
> > before it can call kthread_stop().
> >
> > Anyway, let me know if this is an acceptable change (or not). I think
> > checking for shutdown state 20 times per seconds instead of 1000 times
> > per second is kind of reasonable.
>
> Make that schedule_timeout_uninterruptible(HZ / 20) and sold!

Thanks! I'll send a patch shortly. I think I nailed it correctly. The
problem is fullstop is set to FULLSTOP_RMMOD , before kthread_stop()
is called. This causes all the rcutorture threads to enter the tight
loop in kthread_stopping(). Further this can happening in a thundering
herd fashion with every thread queueing timers constantly causing the
timer softirq to stall a writer which just happened to be executing
synchronize.

I just did a 100 runs with HZ/20 and all pass.

Patch on the way. Thanks,

 - Joel



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux