Re: [PATCH] OMAP CPUIDLE: CPU Idle latency measurement

Silesh C V <saileshcv@xxxxxxxxx> · Tue, 31 Aug 2010 09:52:48 +0530

Hi Vishwa,

On Mon, Aug 30, 2010 at 6:29 PM, Sripathy, Vishwanath
<vishwanath.bs@xxxxxx> wrote:
> Kevin,
>
>> -----Original Message-----
>> From: linux-omap-owner@xxxxxxxxxxxxxxx [mailto:linux-omap-
>> owner@xxxxxxxxxxxxxxx] On Behalf Of Kevin Hilman
>> Sent: Saturday, August 28, 2010 12:45 AM
>> To: vishwanath.sripathy@xxxxxxxxxx
>> Cc: linux-omap@xxxxxxxxxxxxxxx; linaro-dev@xxxxxxxxxxxxxxxx
>> Subject: Re: [PATCH] OMAP CPUIDLE: CPU Idle latency measurement
>>
>> vishwanath.sripathy@xxxxxxxxxx writes:
>>
>> > From: Vishwanath BS <vishwanath.sripathy@xxxxxxxxxx>
>> >
>> > This patch has instrumentation code for measuring latencies for
>> > various CPUIdle C states for OMAP. Idea here is to capture the
>> > timestamp at various phases of CPU Idle and then compute the sw
>> > latency for various c states.  For OMAP, 32k clock is chosen as
>> > reference clock this as is an always on clock.  wkup domain memory
>> > (scratchpad memory) is used for storing timestamps.  One can see the
>> > worstcase latencies in below sysfs entries (after enabling
>> > CONFIG_CPU_IDLE_PROF in .config). This information can be used to
>> > correctly configure cpu idle latencies for various C states after
>> > adding HW latencies for each of these sw latencies.
>> > /sys/devices/system/cpu/cpu0/cpuidle/state<n>/actual_latency
>> > /sys/devices/system/cpu/cpu0/cpuidle/state<n>/sleep_latency
>> > /sys/devices/system/cpu/cpu0/cpuidle/state<n>/wkup_latency
>> >
>> > THis patch is tested on OMAP ZOOM3 using kevin's pm branch.
>> >
>> > Signed-off-by: Vishwanath BS <vishwanath.sripathy@xxxxxxxxxx>
>> > Cc: linaro-dev@xxxxxxxxxxxxxxxx
>>
>> While I have many problems with the implementation details, I won't go
>> into them because in general this is the wrong direction for kernel
>> instrumentation.
>>
>> This approach adds quite a bit overhead to the idle path itself.  With
>> all the reads/writes from/to the scratchpad(?) and all the multiplications
>> and divides in every idle path, as well as the wait-for-idlest in both
>> the sleep and resume paths.  The additional overhead added is non trivial.
>>
>> Basically, I'd like get away from custom instrumentation and measurement
>> coded inside the kernel itself.  This kind of code never stops growing
>> and morphing into ugliness, and rarely scales well when new SoCs are
>> added.
>>
>> With ftrace/perf, we can add tracepoints at specific points and use
>> external tools to extract and analyze the delays, latencys etc.
>>
>> The point is to keep the minimum possible in the kernel: just the
>> tracepoints we're interested in.   The rest (calculations, averages,
>> analysis, etc.) does not need to be in the kernel and can be done easier
>> and with more powerful tools outside the kernel.
> The challenge here is that we need to take time stamp at the fag end of CPU Idle which means we have no access to DDR, MMU/Caches are disabled etc (on OMAP3). So I am not sure if we will be able to use ftrace/perf kind of tools here. If we choose to exclude assembly code part for measurement, then we will be omitting major contributor to CPU Idle latency namely ARM context save/restoration part.
>
> Also these calculations are done only when we enable CPUIDLE profiling feature.
> In the normal production system, these will not come into picture at all. So I am not sure latencies involved in these calculations are still an issue >when we are just doing profiling.

There are two other issues when we use 32k timer for latency measurement.

<snip>
+
+       /* take care of overflow */
+       if (postidle_time < preidle_time)
+               postidle_time += (u32) 0xffffffff;
+       if (wkup_time < sleep_time)
+               wkup_time += (u32) 0xffffffff;
+
<snip>

1.We are checking postidle_time < preidle_time to find out whether
there had been an
   over flow or not. There can be situations in which the timer
overflows and still we have
   a greater postidle_time.

2. We are doing the correction for one overflow. What happens if the
timer overflows for
   a second or third time. Can we keep track of the number of
overflows and then do the
   correction accordingly?

Regards,
Silesh

>
> Regards
> Vishwa
>>
>> Kevin
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html