On Sat, Dec 31, 2022 at 10:14 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > On Fri, Dec 30, 2022 at 9:04 PM Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote: > > > > Hi Joel > > > > As a beginner, I am interested and I have time, could you tell me > > where to download 6.0.16-rc2-g50f737b34ede when you are convenient? > > And could you tell me which torture command you are invoking? I may > > help to do some tests ;-) > > Sure Zhouyi, You can checkout linux-6.0.y from the stable tree [1] and > easily reproduce it via: > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5 > --configs "25*TREE07" > This will take 2 hours to run a total of 25 tests of TREE07. The main > interest is in the shutdown stage. > > If you want to collect traces when the RCU stall triggers, you can do > something like: > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5 > --configs 20*TREE07 --bootargs > "trace_event=sched:sched_switch,sched:sched_waking,sched:sched_wakeup,rcu:rcu_callback,rcu:rcu_fqs,rcu:rcu_grace_period > ftrace_dump_on_oops panic_on_warn=1 sysctl.kernel.panic_on_rcu_stall=1 > sysctl.kernel.max_rcu_stall_to_panic=1" trace_buf_size=300K --kconfig > "CONFIG_RCU_TRACE=y CONFIG_DEBUG_INFO_DWARF5=y > CONFIG_RANDOMIZE_BASE=n" > > You can see these traces in a link in my last email. Refer to that > email for the special trace_printk() patch as well if you fancy, > though that is optional. Thank Joel for your guidance! Your instruction is clear and easy to follow! I will do it. I may report my testing result tomorrow. > > As a suggestion, also avoid top-posting to mailing lists [2] Thank Joel for your guidance! Thank you for correcting my mistake, which I have been doing like this for years. Without your guidance, I may continue my inappropriate behavior forever. Cheers Zhouyi > > Cheers, > Joel > [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > [2] https://kernelnewbies.org/mailinglistguidelines > > > > > Cheers > > Zhouyi > > > > On Sat, Dec 31, 2022 at 9:46 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > > > Hello, > > > > > > I have been firefighting a hang on 6.0.y stable kernels with > > > rcutorture. It happens mostly consistently when TREE07 is shutting > > > down. > > > > > > It appears that the RCU torture threads are attempted to stop but the > > > shutdown thread, but are constantly awakened by a timer softirq > > > handler in ksoftirqd context. When they wake up, they immediately goto > > > sleep in uninterruptible state until the next time a timer handler > > > wakes them up. It appears the timer softirq is long enough to cause > > > RCU stalls and I see it calling 100s of timer function handlers > > > (call_timer_fn). > > > > > > I am doing some more investigation with trace_printk(s): > > > https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=stable/trace-hang-6.0.y&id=b779b1e92c97f29333a282ee8f548da02f64de2b > > > > > > Regarding the timer handlers, I was wondering if it is possible that a > > > large number of timer handlers constantly queued can cause RCU stalls > > > due to the timer softirq taking a very long time. That certainly > > > appears to be the case here. Shouldn't the timer softirq also do > > > rcu_softirq_qs() similar to the ksoftirq loop, in case there are too > > > many of them? > > > > > > Here is a full log with trace dump if you anyone wants to take a look: > > > http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.0.y/11/artifact/tools/testing/selftests/rcutorture/res/2022.12.30-22.57.13/TREE07.2/console.log > > > And the res directory: > > > http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.0.y/11/artifact/tools/testing/selftests/rcutorture/res/2022.12.30-22.57.13/TREE07.2/ > > > > > > Any thoughts on any patches 6.0 might be missing? > > > > > > Meanwhile, debug here continues... thanks, > > > > > > - Joel