Re: cyclictest better values with system load than without (OMAP3530 target)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/29/2013 01:56 PM, Sebastian Andrzej Siewior wrote:
* Clark Williams | 2013-11-26 10:12:32 [-0600]:

In my experience (on x86_64 mainly), that behavior (worse times when
not under load) is due to the overhead of coming out of power-save/idle
states. When you've got a big load on the system and all the cores are
active, then the power-save logic and/or the idle logic doesn't kick in
and devices aren't being powered down.
This is the case here, too. The overhead comming out of a deep power
state plus the invalidated caches.
Sorry, I feel that the discussion a somewhat out of sync with the original posting. Let me explain.

Among others, processors may use two completely different interfaces to save power:
1. Sleep states aka C states, Linux interface cpuidle
2. Clock frequency modulation aka P states, Linux interface cpufreq

1. Sleep states
Processors may come with a number of C states from light sleep to deep sleep to save power when idle. The longer a processor is idle, the deeper normally is the sleep state the processor may enter. Sleep states may be disabled i) on a per-processor and per-state basis in /sys/devices/system/cpu/cpuX/cpuidle/stateX/disable or ii) altogether using the somewhat mislabeled /dev/cpu_dma_latency pseudo device. As far as cyclictest is concerned, sleep states normally are disabled altogether. If this is the case, cyclictest prints the message:
# /dev/cpu_dma_latency set to 0us
The original posting contains this line. In consequence, sleep states cannot be responsible for any observed latency prolongation. To check whether sleep states are disabled, the command
  # cat /sys/devices/system/cpu/cpu0/cpuidle/state?/time
may be used repeatedly for every CPU. If sleep states are disabled correctly, only the first state (poll state) may increase such as

# cat /sys/devices/system/cpu/cpu0/cpuidle/state?/time
444330737734
234393550
1760323375
1234658099
183251179053

and sometime later

# cat /sys/devices/system/cpu/cpu0/cpuidle/state?/time
444417947595
234393550
1760323375
1234658099
183251179053

BTW: The cyclictest source contains a related comment:
/* Latency trick
 * if the file /dev/cpu_dma_latency exists,
 * open it and write a zero into it. This will tell
 * the power management system not to transition to
 * a high cstate (in fact, the system acts like idle=poll)
 * When the fd to /dev/cpu_dma_latency is closed, the behavior
 * goes back to the system default.
 *
 * Documentation/power/pm_qos_interface.txt
 */

2. Clock frequency modulation
This is an entirely different story as cylictest has no business with it at all. The clock frequency of x86 processors has a more or less linear effect on latency, e.g. a system running at 1 GHz will show a latency that is twice as high as when running at 2 GHz. ARM processors, however, behave differently. Many ARM cores do not provide acceptable latency values unless running at full speed. It is, therefore, often mandatory to switch to the performance CPU frequency governor before starting cyclictest or before running a real-world user space application that relies on minimum latency. The /sys/devices/system/cpu/cpu0/cpufreq interface is available to manage P states:
Switch to maximum performance:
cd /sys/devices/system/cpu/
for i in cpu?/cpufreq/scaling_governor
do
  echo performance >$i
done
Switch to on-demand frequency modulation:
for i in cpu?/cpufreq/scaling_governor
do
  echo ondemand >$i
done

BTW: Power saving and real-time do not necessarily exclude each other. If a - still deterministic - but a little longer latency is acceptable, some light sleep states and a somewhat lower clock frequency may be allowed which still may result in considerable energy saving. If, however, the fastest possible real-time response is required, C states and P states must be disabled (or set to polling and maximum speed, repsectively) and the power bill must be payed.

So the test now finally has better results on a idle system than on
one with heavy system load. The numbers are still far away from your
latency values on the 1.2GHz Kirkwood. Does anybody have OMAP3
values at hand to compare?
This is why we run the OSADL QA Farm. An AM3359 system is in rack 7, slot 5 -> https://www.osadl.org/?id=1590. We run 100 million cycles with 200 µs cycle interval (which takes about 5 hours and 33 minutes) to obtain reliable data. In addition, the processor is in idle state but also executing defined load scenarios during the recording. Please do the same before you compare the results. To facilitate the comparison, the cyclictest command line is given below every plot, and any other relevant information (including kernel command line) is available in the systems' profiles.

Hope this helps,
	-Carsten.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux