Re: RCU stalls with TREE07 on v6.0 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Dec 31, 2022 at 10:14 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Dec 30, 2022 at 9:04 PM Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote:
> >
> > Hi Joel
> >
> > As a beginner, I am interested and I have time, could you tell me
> > where to download 6.0.16-rc2-g50f737b34ede when you are convenient?
> > And could you tell me which torture command you are invoking? I may
> > help to do some tests ;-)
>
> Sure Zhouyi, You can checkout linux-6.0.y from the stable tree [1] and
> easily reproduce it via:
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5
> --configs "25*TREE07"
> This will take 2 hours to run a total of 25 tests of TREE07. The main
> interest is in the shutdown stage.
>
> If you want to collect traces when the RCU stall triggers, you can do
> something like:
>
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5
> --configs 20*TREE07 --bootargs
> "trace_event=sched:sched_switch,sched:sched_waking,sched:sched_wakeup,rcu:rcu_callback,rcu:rcu_fqs,rcu:rcu_grace_period
> ftrace_dump_on_oops panic_on_warn=1 sysctl.kernel.panic_on_rcu_stall=1
> sysctl.kernel.max_rcu_stall_to_panic=1" trace_buf_size=300K --kconfig
> "CONFIG_RCU_TRACE=y CONFIG_DEBUG_INFO_DWARF5=y
> CONFIG_RANDOMIZE_BASE=n"
Preliminary test results:
Linux 6.0.16-rc2 (git head f54b936f8ec7)
1) Dell PowerEdge R270 two Intel(R) Xeon(R) CPU E5-2660 128G memory:
total 80(4*20) rounds of TREE07
77 of them finished without error
3 of them  reported "rcu: INFO: rcu_sched detected stalls on CPUs/tasks" [1]

2) Lenovo Thinkpad P1 gen 4 i7-11800H 64G memory
total 100(5*20) rounds of TREE07
all 100 finished without error

[1] http://154.220.3.115/logs/20230101/console.log

I learned a lot during this process, and am very interested in this
problem, there is still much for me to learn.

Happy New Year to you, Paul and all RCUers ;-)

Cheers
Zhouyi
>
> You can see these traces in a link in my last email. Refer to that
> email for the special trace_printk() patch as well if you fancy,
> though that is optional.
>
> As a suggestion, also avoid top-posting to mailing lists [2]
>
> Cheers,
> Joel
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> [2] https://kernelnewbies.org/mailinglistguidelines
>
> >
> > Cheers
> > Zhouyi
> >
> > On Sat, Dec 31, 2022 at 9:46 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > >
> > > Hello,
> > >
> > > I have been firefighting a hang on 6.0.y stable kernels with
> > > rcutorture. It happens mostly consistently when TREE07 is shutting
> > > down.
> > >
> > > It appears that the RCU torture threads are attempted to stop but the
> > > shutdown thread, but are constantly awakened by a timer softirq
> > > handler in ksoftirqd context. When they wake up, they immediately goto
> > > sleep in uninterruptible state until the next time a timer handler
> > > wakes them up. It appears the timer softirq is long enough to cause
> > > RCU stalls and I see it calling 100s of timer function handlers
> > > (call_timer_fn).
> > >
> > > I am doing some more investigation with trace_printk(s):
> > > https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=stable/trace-hang-6.0.y&id=b779b1e92c97f29333a282ee8f548da02f64de2b
> > >
> > > Regarding the timer handlers, I was wondering if it is possible that a
> > > large number of timer handlers constantly queued can cause RCU stalls
> > > due to the timer softirq taking a very long time. That certainly
> > > appears to be the case here. Shouldn't the timer softirq also do
> > > rcu_softirq_qs() similar to the ksoftirq loop, in case there are too
> > > many of them?
> > >
> > > Here is a full log with trace dump if you anyone wants to take a look:
> > > http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.0.y/11/artifact/tools/testing/selftests/rcutorture/res/2022.12.30-22.57.13/TREE07.2/console.log
> > > And the res directory:
> > > http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.0.y/11/artifact/tools/testing/selftests/rcutorture/res/2022.12.30-22.57.13/TREE07.2/
> > >
> > > Any thoughts on any patches 6.0 might be missing?
> > >
> > > Meanwhile, debug here continues... thanks,
> > >
> > >  - Joel



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux