On 11/29/2013 01:56 PM, Sebastian Andrzej Siewior wrote:
* Clark Williams | 2013-11-26 10:12:32 [-0600]:
In my experience (on x86_64 mainly), that behavior (worse times when
not under load) is due to the overhead of coming out of power-save/idle
states. When you've got a big load on the system and all the cores are
active, then the power-save logic and/or the idle logic doesn't kick in
and devices aren't being powered down.
This is the case here, too. The overhead comming out of a deep power
state plus the invalidated caches.
Sorry, I feel that the discussion a somewhat out of sync with the
original posting. Let me explain.
Among others, processors may use two completely different interfaces to
save power:
1. Sleep states aka C states, Linux interface cpuidle
2. Clock frequency modulation aka P states, Linux interface cpufreq
1. Sleep states
Processors may come with a number of C states from light sleep to deep
sleep to save power when idle. The longer a processor is idle, the
deeper normally is the sleep state the processor may enter. Sleep states
may be disabled i) on a per-processor and per-state basis in
/sys/devices/system/cpu/cpuX/cpuidle/stateX/disable or ii) altogether
using the somewhat mislabeled /dev/cpu_dma_latency pseudo device. As far
as cyclictest is concerned, sleep states normally are disabled
altogether. If this is the case, cyclictest prints the message:
# /dev/cpu_dma_latency set to 0us
The original posting contains this line. In consequence, sleep states
cannot be responsible for any observed latency prolongation. To check
whether sleep states are disabled, the command
# cat /sys/devices/system/cpu/cpu0/cpuidle/state?/time
may be used repeatedly for every CPU. If sleep states are disabled
correctly, only the first state (poll state) may increase such as
# cat /sys/devices/system/cpu/cpu0/cpuidle/state?/time
444330737734
234393550
1760323375
1234658099
183251179053
and sometime later
# cat /sys/devices/system/cpu/cpu0/cpuidle/state?/time
444417947595
234393550
1760323375
1234658099
183251179053
BTW: The cyclictest source contains a related comment:
/* Latency trick
* if the file /dev/cpu_dma_latency exists,
* open it and write a zero into it. This will tell
* the power management system not to transition to
* a high cstate (in fact, the system acts like idle=poll)
* When the fd to /dev/cpu_dma_latency is closed, the behavior
* goes back to the system default.
*
* Documentation/power/pm_qos_interface.txt
*/
2. Clock frequency modulation
This is an entirely different story as cylictest has no business with it
at all. The clock frequency of x86 processors has a more or less linear
effect on latency, e.g. a system running at 1 GHz will show a latency
that is twice as high as when running at 2 GHz. ARM processors, however,
behave differently. Many ARM cores do not provide acceptable latency
values unless running at full speed. It is, therefore, often mandatory
to switch to the performance CPU frequency governor before starting
cyclictest or before running a real-world user space application that
relies on minimum latency. The /sys/devices/system/cpu/cpu0/cpufreq
interface is available to manage P states:
Switch to maximum performance:
cd /sys/devices/system/cpu/
for i in cpu?/cpufreq/scaling_governor
do
echo performance >$i
done
Switch to on-demand frequency modulation:
for i in cpu?/cpufreq/scaling_governor
do
echo ondemand >$i
done
BTW: Power saving and real-time do not necessarily exclude each other.
If a - still deterministic - but a little longer latency is acceptable,
some light sleep states and a somewhat lower clock frequency may be
allowed which still may result in considerable energy saving. If,
however, the fastest possible real-time response is required, C states
and P states must be disabled (or set to polling and maximum speed,
repsectively) and the power bill must be payed.
So the test now finally has better results on a idle system than on
one with heavy system load. The numbers are still far away from your
latency values on the 1.2GHz Kirkwood. Does anybody have OMAP3
values at hand to compare?
This is why we run the OSADL QA Farm. An AM3359 system is in rack 7,
slot 5 -> https://www.osadl.org/?id=1590. We run 100 million cycles with
200 µs cycle interval (which takes about 5 hours and 33 minutes) to
obtain reliable data. In addition, the processor is in idle state but
also executing defined load scenarios during the recording. Please do
the same before you compare the results. To facilitate the comparison,
the cyclictest command line is given below every plot, and any other
relevant information (including kernel command line) is available in the
systems' profiles.
Hope this helps,
-Carsten.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html