Re: Real-time kernel thread performance and optimization

Carsten Emde <C.Emde@xxxxxxxxx> · Wed, 19 Dec 2012 09:10:35 +0100



Simon,

[..]
Bonus-question:
   - Additionally, I've tried running cyclictest alongside with all
the above, and it actually performs rather well, without any
substantial spikes. A strange thing is though, that the results are
actually better with load than without? (running with -t1 -p 80 -n -i 10000 -l 10000)
   - Loaded: Min: 16, Avg: 41, Max: 177
   - No load: Min: 16, Avg: 97, Max: 263

If the system is less loaded, then the idle thread might be able to
enter deeper levels of sleep.  Deeper levels of sleep have larger
latencies to exit.  You would have to look at your processor specific
values for exiting sleep states to see if this is sufficient to
explain the difference.
If running a half-decent version of cyclictest, sleep states are generally
disabled while cyclictest is running. Please watch the line
    # /dev/cpu_dma_latency set to 0us
which essentially documents this mechanism. Yes, the name of the variable
"cpu_dma_latency" is not obvious and cyclictest could do a better job by
writing
    Wrote 0 to /dev/cpu_dma_latency and keeping the path open to prevent
    all cores from entering any sleep state but this is another story.

A patch that was merged to 3.7 allows to individually enable or disable sleep
states of the ladder governor
(http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=62d6ae880e3e76098d5e345decd2dce443975889).
It smoothly applies to 3.6-RT as well. This allows to fine-tune the sleep states
by state and core, while the /dev/cpu_dma_latency mechanism acts on all
states and cores, e.g. to disable sleep state 2 and all deeper states of the
ladder governor on core #0, use:
    echo 1>/sys/devices/system/cpu/cpu0/cpuidle/state2/disable

BTW: To analyze how much time a core spent in a specific sleep state, simply
look repeatedly at the "time" variable of a core's sleep state, e.g. for core #0:
# for i in /sys/devices/system/cpu/cpu0/cpuidle/state[0-4]
  >  do
  >  echo -e "`cat $i/name`:\t`cat $i/time`"
  >  done
POLL:	1342984105
C1-IVB:	737109
C3-IVB:	3852451
C6-IVB:	1702683112
C7-IVB:	4366946606
While cyclictest is running with /dev/cpu_dma_latency set to 0, only the POLL
state times are increasing.
Thanks for the reply! As I wrote in my reply to Frank, I'm not completely
sure if P states are correctly implemented in our system. We're using a
custom BIOS for our custom board, and while P states do show up and are
modifiable (I've currently installed the userspace-governor, and am
manually setting the clock-frequency to the lowest possible at startup),
our board guy is not sure that changing it actually has any effect on the
processor. Yay...:/
Sorry, but this is a complete misunderstanding. C states and P states 
are very different 
(http://software.intel.com/en-us/blogs/2008/03/12/c-states-and-p-states-are-very-different). 
The point made by Frank and my answer related to C states (aka sleep 
states) a processor may enter when idle. The Linux C state interface is 
called cpuidle. The P states you are referring to are related to the 
processor's clock frequency that may be lowered at any time irrespective 
of idle state. The Linux P state interface is called cpufreq. P states 
generally affect the real-time capabilities in a linear and proportional 
way, e.g. a CPU board with a worst-case latency of 100 microseconds at 1 
GHz will have a latency of approximately 200 microseconds at 500 MHz. 
When idle and in deep C state, however, the processor may take several 
milliseconds to wake up and answer an asynchronous external event. This 
is why deep C states should be disabled in a real-time system that may 
become idle. And this is why I mentioned the new interface that allows 
to individually disable a particular sleep state of a particular 
processor core to ensure its deterministic behavior while the other 
cores still may run in energy-saving mode.

Hope this helps,
Carsten.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html