Re: RCU stalls with TREE07 on v6.0 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Dec 31, 2022, at 12:27 PM, Frederic Weisbecker <frederic@xxxxxxxxxx> wrote:
> 
> On Fri, Dec 30, 2022 at 08:42:24PM -0500, Joel Fernandes wrote:
>> Hello,
>> 
>> I have been firefighting a hang on 6.0.y stable kernels with
>> rcutorture. It happens mostly consistently when TREE07 is shutting
>> down.
>> 
>> It appears that the RCU torture threads are attempted to stop but the
>> shutdown thread, but are constantly awakened by a timer softirq
>> handler in ksoftirqd context. When they wake up, they immediately goto
>> sleep in uninterruptible state until the next time a timer handler
>> wakes them up. It appears the timer softirq is long enough to cause
>> RCU stalls and I see it calling 100s of timer function handlers
>> (call_timer_fn).
>> 
>> I am doing some more investigation with trace_printk(s):
>> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=stable/trace-hang-6.0.y&id=b779b1e92c97f29333a282ee8f548da02f64de2b
>> 
>> Regarding the timer handlers, I was wondering if it is possible that a
>> large number of timer handlers constantly queued can cause RCU stalls
>> due to the timer softirq taking a very long time. That certainly
>> appears to be the case here. Shouldn't the timer softirq also do
>> rcu_softirq_qs() similar to the ksoftirq loop, in case there are too
>> many of them?
>> 
>> Here is a full log with trace dump if you anyone wants to take a look:
>> http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.0.y/11/artifact/tools/testing/selftests/rcutorture/res/2022.12.30-22.57.13/TREE07.2/console.log
>> And the res directory:
>> http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.0.y/11/artifact/tools/testing/selftests/rcutorture/res/2022.12.30-22.57.13/TREE07.2/
>> 
>> Any thoughts on any patches 6.0 might be missing?
> 
> I can't reproduce it in v6.0 (vanilla not stable) after 100 runs of 5 minutes,
> so may be it's actually some patches too many instead :-)

Thanks for that insight!

- Joel 

> 
>> 
>> Meanwhile, debug here continues... thanks,
>> 
>> - Joel




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux