rcu stall caused by rt task with high minor page fault rate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello!

Running  5.10.104-rt63 SMP PREEMPT_RT on dual core imx7d.

Currently I am debugging an rcu stall issue caused by an user-space rt task.

The test procedure to reproduce the issue is:
1. Bootup the system
2. initd starts rt task application (SCHED_FIFO, priority -13 and
affinity set to core0)
3. login via ssh and start e.g. memtester with the maximum amount of
free RAM available
4. memtester locks its memory with mlock successfully
5. After some time the rt task is stuck consuming 100% system time on core0.
6. Kernel produces rcu stall warnings because rcu kthread does not get
any CPU on core0.

Looking at the vm stats of the rt thread shows a minor page fault rate
 > 350k/s.
So the process is stuck in memory handling and because of the core
binding the rcu kthread does not get any core0 cpu time and produces
stall warnings.

Reading https://wiki.linuxfoundation.org/realtime/documentation/technical_details/rcu
CONFIG_RCU_BOOST=y, should be the solution for such issues.
But enabling RCU_BOOST did not change anything.

See link above:
> However, bugs can happen, including bugs involving infinite loops in high-priority real-time threads. Debugging these problems is more difficult if the system keeps hanging due to OOM. One way to ease debugging is to build with CONFIG_RCU_BOOST=y,

The main cause for the minor page faults is the missing mlock in the
application.
mlock is always necessary for rt apps.

But for my understanding RCU_BOOST should help here, even if the rt
app is not implemented correctly?

Thanks in advance!



[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux