Re: Issue with cyclictest, RT_GROUP_SCHED, isolcpus and NOHZ_FULL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020-12-30 14:09:19 [+0100], Jonathan Schwender wrote:
> Hi everyone,
> 
> I've been trying to test the real-time `performance` possible with
> containers, by running cyclictest in a container on an RT-Kernel.
> The issue I've been having does not require containers or an
> RT kernel though.
> 
> Issue: cyclictest freezes after running for a few seconds
> to minutes. After that only the loadavg section is updated,
> while the count line does not change anymore.
> cyclictest can't be killed after that point
> other than by restarting the machine, and
> this also takes a few minutes until the kernel kills
> cyclictest.
> 
> This behaviour only occurs when the following conditions are
> met:
> 
> - RT_GROUP_SCHED is used
> - cyclictest is bound to an isolated cpu core with
>   nohz_full=<core>, and isolcpus=nohz,domain,<core>

So if you remove RT_GROUP_SCHED and use cyclictest on the nohz_full
cores then everything is fine?

> I've tested this on a machine with Fedora 33 and vanilla
> stable 5.10.3 kernel with RT_GROUP_SCHED.
> The same behaviour also exists on 5.10.1-rt20 with
> PREEMPT_RT and RT_GROUP_SCHED configured.
> 
> After booting I configure the rt_runtime_us like this:
> `echo "700000" > /sys/fs/cgroup/cpu,cpuacct/user.slice/cpu.rt_runtime_us`
> `echo "100000" > /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us`
> 
> Then I start cyclictest via:
> `taskset -c 14 cgexec -g cpu,cpuacct:user.slice cyclictest --mlockall \
>   --priority=96 --interval=200 --affinity=14 --duration=15m`
> 
> These are the cmdline options I tried out to narrow the problem down:
> working: `isolcpus=14 irqaffinity=0-3 maxcpus=15
> systemd.unified_cgroup_hierarchy=0`
> working: `isolcpus=nohz,14 nohz_full=14 irqaffinity=0-3 maxcpus=15
> systemd.unified_cgroup_hierarchy=0`
> working: `isolcpus=nohz,domain,14 irqaffinity=0-3 maxcpus=15
> systemd.unified_cgroup_hierarchy=0`
> broken:  `isolcpus=nohz,domain,14 nohz_full=14 irqaffinity=0-3 maxcpus=15
> systemd.unified_cgroup_hierarchy=0`
> 
> unified_cgroup_hierarchy is needed to get cgroups v1, which
> seems to be needed for RT_GROUP_SCHED (at least I couldn't
> find any options similar to cpu.rt_runtime_us with the default
> cgroup v2).
> Basically it boils down to that the combination of the
> domain parameter to isolcpus and nohz_full together with
> RT_GROUP_SCHED cause the problem I'm observing.
> 
> Does anyone have any idea what could be causing this?
> Am I doing something wrong, or is there an issue with cyclictest or
> even the kernel that's causing this?
> 
> My motivation is running (testing) a real-time container on isolated
> cores, so I think I do need all the kernel parameters I used above to
> get good latencies.

You might want to try without nohz_full. My understanding is that this
used if your application remains mostly in userland (and uses no
syscalls, etc.).

Let me this on my list of things to try out.

> Regards,
> 
> Jonathan Schwender

Sebastian




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux