Hi folks, I'm seeing a behavior I don't understand, that I'm hoping you
can help me with.
The setup is:
Raspberry Pi 4 (4x ARMv8 Cortex-A72)
Debian Bookworm arm64 (https://raspi.debian.net/tested-images/)
Linux 5.19.14 with 5.19-rt10 patches (config in git repo below)
cmdline:
irqaffinity=0-1
rcu_nocbs=2-3
rcu_nocb_poll
nohz_full=2-3
isolcpus=nohz,domain,managed_irq,2-3
On this system I run a realtime thread that does all the
latency-reducing tricks I know of:
kernel command line as specified above
cpufreq scaling governor = performance
mlockall()
set cpu affinity to an isolcpu processor
run in userspace continuously, never sleep or yield the cpu
The surprising behavior is that if I set the realtime thread to use
scheduling policy SCHED_FIFO, and set its priority to the max, i get
terrible behavior - lots of preemption, and lots of idle time on the
CPU. On the other hand, if I leave the realtime thread with scheduling
policy SCHED_OTHER, and the default priority of 0, then the system
performs great - hardly any preemption.
The test program is here:
https://github.com/SebKuzminsky/preempt-rt-latency-test
That program does the process-wide realtime setup, then runs a realtime
thread with SCHED_OTHER (which performs well), joins that thread and
instead runs a second thread with SCHED_FIFO (which performs poorly).
It doesn't matter which order I run the threads in, SCHED_FIFO first
still performs poorly. Both threads run the same function:
busywait 1 ms (using the hardware cycle counter for timing)
check the cycle count
repeat 10k times
return
So it doesn't do anything useful, it just looks for latency.
The results look like this:
using PTHREAD_INHERIT_SCHED to keep SCHED_OTHER (with default priority)
scheduling policy: SCHED_OTHER
scheduling parameter priority: 0 (min=0, max=0)
cpu affinity: 3
after 10000 iterations:
min=54001 cycles (1000.019 us, 1.000 ms)
avg=54002.001 cycles (1000.037 us, 1.000 ms)
max=54009 cycles (1000.167 us, 1.000 ms)
OK: worst latency < 2 ms
using PTHREAD_EXPLICT_SCHED to set SCHED_FIFO (with highest priority)
scheduling policy: SCHED_FIFO
scheduling parameter priority: 99 (min=1, max=99)
cpu affinity: 3
after 10000 iterations:
min=54001 cycles (1000.019 us, 1.000 ms)
avg=78900.171 cycles (1461.114 us, 1.461 ms)
max=53188533 cycles (984972.833 us, 984.973 ms)
ERROR: worst latency > 2 ms
The results with SCHED_OTHER are great, but the results with SCHED_FIFO
are terrible. This is surprising to me! I'd expect a thread using
SCHED_FIFO with max priority to behave at least as well as a thread
using SCHED_OTHER with the default priority. Am I misunderstanding
something here, or is this a bug?
I'm happy to run any experiments people suggest.
--
Sebastian Kuzminsky