raspberry pi 4b preempt_rt performance 32 bit vs 64 bit kernels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi list,

I spent some time trying to evaluate some different raspberrypi-kernel/preempt_rt patch combinations and found some "interesting" results which maybe one of you can shed a bit of light on.

I am aware that the raspberrypi-kernel is not really vanilla anymore and so it's possible that nothing much can be said about the issue, but I'm giving it a shot nonetheless.

As a preamble here's my evaluation routine:

- Set scaling_governor to performance for all cpus
- pin the gpu frequency to either 250 or 500 mhz

I use the 32 bit or 64 bit raspberry os on the rpi 4b to build/run 32 bit and 64 bit kernels respectively.

The major kernel configuration options (found mostly by trial and error besides PREEMPT_RT):

- disable all kernel profiling, latency measurement and debugging options
- disable process accounting
- use a 1000 Hz timer
- use periodic timer instead of dynamic ticks

I then boot the different kernels, run

sudo ./cyclictest -M -p 90 -S --mlockall

and in another terminal trigger a rebuild of a linux kernel with -j4. Then I wait for a while and note down the maximum. Here's some results:

32 bit kernels:

4.19.71-rt24:   ca. 130 us
5.10.90-rt61:   ca. 140 us
5.18.0-rc7-rt9  ca. 160 us

It looks like there's rather clear regression going forward with kernel versions.

Here's a 64 bit kernel:

5.18.0-rc7-rt9: ca. 230 us

So that's even worse. Before I went on to full 1000Hz with periodic timer, disabling all kernel debugging, etc, and pinning the gpu frequency I measured some more kernels:

32 bit:

4.19.71-rt24:   ca. 200us
5.10.90-rt61:   ca. 200 us
5.15.40-rt43:   ca. 190 us
5.18.0-rc7-rt9: ca. 170 us

64 bit:

5.15.40-rt43:   ca. 220 us
5.18.0-rc7-rt9: ca. 270 us

So the regression going forward over kernel versions isn't as clear cut anymore but one trend is overwhelmingly visible over all these tests:

64 bit kernels have a higher maximum latency when compared to 32 bit kernels.

Do you happen to have an idea why that may be? Is there some additional tweak required for the 64 bit kernels on that hardware?

Also an additional observation and question:

The high latencies are triggered by kernel compiles. Just running stress -c 8 or writing zeros to the SD Card do not trigger them. It seems to be specific to that combined work load.

I played with lowering the threaded IRQ priorities for the mmc drivers but that had no effect at all, as expected since they run at priority 50 per default and cyclictest runs at 90. Do you have an idea what might be the problem triggered by that particular workload?

Also coming back to the point about the kernel not being really vanilla: If I rebuild the kernel with latency measurement intrumentation, etc, and get some function traces for high latency code paths would you people even consider looking at them? :)

Kind regards,
FPS


--
Biologische Kybernetik
Universität Bielefeld
Phone: ++49 521 106 5535
http://www.uni-bielefeld.de/biologie/Kybernetik/index.html



[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux