On 2020-12-30 14:09:19 [+0100], Jonathan Schwender wrote: > Hi everyone, > > I've been trying to test the real-time `performance` possible with > containers, by running cyclictest in a container on an RT-Kernel. > The issue I've been having does not require containers or an > RT kernel though. > > Issue: cyclictest freezes after running for a few seconds > to minutes. After that only the loadavg section is updated, > while the count line does not change anymore. > cyclictest can't be killed after that point > other than by restarting the machine, and > this also takes a few minutes until the kernel kills > cyclictest. > > This behaviour only occurs when the following conditions are > met: > > - RT_GROUP_SCHED is used > - cyclictest is bound to an isolated cpu core with > nohz_full=<core>, and isolcpus=nohz,domain,<core> So if you remove RT_GROUP_SCHED and use cyclictest on the nohz_full cores then everything is fine? > I've tested this on a machine with Fedora 33 and vanilla > stable 5.10.3 kernel with RT_GROUP_SCHED. > The same behaviour also exists on 5.10.1-rt20 with > PREEMPT_RT and RT_GROUP_SCHED configured. > > After booting I configure the rt_runtime_us like this: > `echo "700000" > /sys/fs/cgroup/cpu,cpuacct/user.slice/cpu.rt_runtime_us` > `echo "100000" > /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us` > > Then I start cyclictest via: > `taskset -c 14 cgexec -g cpu,cpuacct:user.slice cyclictest --mlockall \ > --priority=96 --interval=200 --affinity=14 --duration=15m` > > These are the cmdline options I tried out to narrow the problem down: > working: `isolcpus=14 irqaffinity=0-3 maxcpus=15 > systemd.unified_cgroup_hierarchy=0` > working: `isolcpus=nohz,14 nohz_full=14 irqaffinity=0-3 maxcpus=15 > systemd.unified_cgroup_hierarchy=0` > working: `isolcpus=nohz,domain,14 irqaffinity=0-3 maxcpus=15 > systemd.unified_cgroup_hierarchy=0` > broken: `isolcpus=nohz,domain,14 nohz_full=14 irqaffinity=0-3 maxcpus=15 > systemd.unified_cgroup_hierarchy=0` > > unified_cgroup_hierarchy is needed to get cgroups v1, which > seems to be needed for RT_GROUP_SCHED (at least I couldn't > find any options similar to cpu.rt_runtime_us with the default > cgroup v2). > Basically it boils down to that the combination of the > domain parameter to isolcpus and nohz_full together with > RT_GROUP_SCHED cause the problem I'm observing. > > Does anyone have any idea what could be causing this? > Am I doing something wrong, or is there an issue with cyclictest or > even the kernel that's causing this? > > My motivation is running (testing) a real-time container on isolated > cores, so I think I do need all the kernel parameters I used above to > get good latencies. You might want to try without nohz_full. My understanding is that this used if your application remains mostly in userland (and uses no syscalls, etc.). Let me this on my list of things to try out. > Regards, > > Jonathan Schwender Sebastian