It's never a good idea to use fifo:99 for a realtime application; the migration thread uses fifo:99 and all the interrupt threads run at fifo:50, so your busy-wait fifo:99 thread is holding off interrupts (devices, IPIs, etc) on tho Try running your SCHED_FIFO thread at something like fifo:2 and see if your performance improves. Clark On Wed, Oct 12, 2022 at 9:35 PM Sebastian Kuzminsky <seb@xxxxxxxxxxx> wrote: > > Hi folks, I'm seeing a behavior I don't understand, that I'm hoping you > can help me with. > > The setup is: > Raspberry Pi 4 (4x ARMv8 Cortex-A72) > Debian Bookworm arm64 (https://raspi.debian.net/tested-images/) > Linux 5.19.14 with 5.19-rt10 patches (config in git repo below) > cmdline: > irqaffinity=0-1 > rcu_nocbs=2-3 > rcu_nocb_poll > nohz_full=2-3 > isolcpus=nohz,domain,managed_irq,2-3 > > On this system I run a realtime thread that does all the > latency-reducing tricks I know of: > kernel command line as specified above > cpufreq scaling governor = performance > mlockall() > set cpu affinity to an isolcpu processor > run in userspace continuously, never sleep or yield the cpu > > The surprising behavior is that if I set the realtime thread to use > scheduling policy SCHED_FIFO, and set its priority to the max, i get > terrible behavior - lots of preemption, and lots of idle time on the > CPU. On the other hand, if I leave the realtime thread with scheduling > policy SCHED_OTHER, and the default priority of 0, then the system > performs great - hardly any preemption. > > The test program is here: > https://github.com/SebKuzminsky/preempt-rt-latency-test > > That program does the process-wide realtime setup, then runs a realtime > thread with SCHED_OTHER (which performs well), joins that thread and > instead runs a second thread with SCHED_FIFO (which performs poorly). > It doesn't matter which order I run the threads in, SCHED_FIFO first > still performs poorly. Both threads run the same function: > busywait 1 ms (using the hardware cycle counter for timing) > check the cycle count > repeat 10k times > return > > So it doesn't do anything useful, it just looks for latency. > > The results look like this: > > > using PTHREAD_INHERIT_SCHED to keep SCHED_OTHER (with default priority) > > scheduling policy: SCHED_OTHER > > scheduling parameter priority: 0 (min=0, max=0) > > cpu affinity: 3 > > after 10000 iterations: > > min=54001 cycles (1000.019 us, 1.000 ms) > > avg=54002.001 cycles (1000.037 us, 1.000 ms) > > max=54009 cycles (1000.167 us, 1.000 ms) > > OK: worst latency < 2 ms > > > > using PTHREAD_EXPLICT_SCHED to set SCHED_FIFO (with highest priority) > > scheduling policy: SCHED_FIFO > > scheduling parameter priority: 99 (min=1, max=99) > > cpu affinity: 3 > > after 10000 iterations: > > min=54001 cycles (1000.019 us, 1.000 ms) > > avg=78900.171 cycles (1461.114 us, 1.461 ms) > > max=53188533 cycles (984972.833 us, 984.973 ms) > > ERROR: worst latency > 2 ms > > The results with SCHED_OTHER are great, but the results with SCHED_FIFO > are terrible. This is surprising to me! I'd expect a thread using > SCHED_FIFO with max priority to behave at least as well as a thread > using SCHED_OTHER with the default priority. Am I misunderstanding > something here, or is this a bug? > > I'm happy to run any experiments people suggest. > > > -- > Sebastian Kuzminsky >