Re: [PATCH] OMAP CPUIDLE: CPU Idle latency measurement

Silesh C V <silesh@xxxxxx> · Tue, 31 Aug 2010 12:27:24 +0530

On Tue, Aug 31, 2010 at 10:28 AM, Sripathy, Vishwanath
<vishwanath.bs@xxxxxx> wrote:
>
>
>> -----Original Message-----
>> From: Silesh C V [mailto:saileshcv@xxxxxxxxx]
>> Sent: Tuesday, August 31, 2010 9:53 AM
>> To: Sripathy, Vishwanath
>> Cc: Kevin Hilman; vishwanath.sripathy@xxxxxxxxxx; linux-omap@xxxxxxxxxxxxxxx;
>> linaro-dev@xxxxxxxxxxxxxxxx
>> Subject: Re: [PATCH] OMAP CPUIDLE: CPU Idle latency measurement
>>
>> Hi Vishwa,
>>
>> On Mon, Aug 30, 2010 at 6:29 PM, Sripathy, Vishwanath
>> <vishwanath.bs@xxxxxx> wrote:
>> > Kevin,
>> >
>> >> -----Original Message-----
>> >> From: linux-omap-owner@xxxxxxxxxxxxxxx [mailto:linux-omap-
>> >> owner@xxxxxxxxxxxxxxx] On Behalf Of Kevin Hilman
>> >> Sent: Saturday, August 28, 2010 12:45 AM
>> >> To: vishwanath.sripathy@xxxxxxxxxx
>> >> Cc: linux-omap@xxxxxxxxxxxxxxx; linaro-dev@xxxxxxxxxxxxxxxx
>> >> Subject: Re: [PATCH] OMAP CPUIDLE: CPU Idle latency measurement
>> >>
>> >> vishwanath.sripathy@xxxxxxxxxx writes:
>> >>
>> >> > From: Vishwanath BS <vishwanath.sripathy@xxxxxxxxxx>
>> >> >
>> >> > This patch has instrumentation code for measuring latencies for
>> >> > various CPUIdle C states for OMAP. Idea here is to capture the
>> >> > timestamp at various phases of CPU Idle and then compute the sw
>> >> > latency for various c states.  For OMAP, 32k clock is chosen as
>> >> > reference clock this as is an always on clock.  wkup domain memory
>> >> > (scratchpad memory) is used for storing timestamps.  One can see the
>> >> > worstcase latencies in below sysfs entries (after enabling
>> >> > CONFIG_CPU_IDLE_PROF in .config). This information can be used to
>> >> > correctly configure cpu idle latencies for various C states after
>> >> > adding HW latencies for each of these sw latencies.
>> >> > /sys/devices/system/cpu/cpu0/cpuidle/state<n>/actual_latency
>> >> > /sys/devices/system/cpu/cpu0/cpuidle/state<n>/sleep_latency
>> >> > /sys/devices/system/cpu/cpu0/cpuidle/state<n>/wkup_latency
>> >> >
>> >> > THis patch is tested on OMAP ZOOM3 using kevin's pm branch.
>> >> >
>> >> > Signed-off-by: Vishwanath BS <vishwanath.sripathy@xxxxxxxxxx>
>> >> > Cc: linaro-dev@xxxxxxxxxxxxxxxx
>> >>
>> >> While I have many problems with the implementation details, I won't go
>> >> into them because in general this is the wrong direction for kernel
>> >> instrumentation.
>> >>
>> >> This approach adds quite a bit overhead to the idle path itself.  With
>> >> all the reads/writes from/to the scratchpad(?) and all the multiplications
>> >> and divides in every idle path, as well as the wait-for-idlest in both
>> >> the sleep and resume paths.  The additional overhead added is non trivial.
>> >>
>> >> Basically, I'd like get away from custom instrumentation and measurement
>> >> coded inside the kernel itself.  This kind of code never stops growing
>> >> and morphing into ugliness, and rarely scales well when new SoCs are
>> >> added.
>> >>
>> >> With ftrace/perf, we can add tracepoints at specific points and use
>> >> external tools to extract and analyze the delays, latencys etc.
>> >>
>> >> The point is to keep the minimum possible in the kernel: just the
>> >> tracepoints we're interested in.   The rest (calculations, averages,
>> >> analysis, etc.) does not need to be in the kernel and can be done easier
>> >> and with more powerful tools outside the kernel.
>> > The challenge here is that we need to take time stamp at the fag end of CPU Idle
>> which means we have no access to DDR, MMU/Caches are disabled etc (on OMAP3).
>> So I am not sure if we will be able to use ftrace/perf kind of tools here. If we choose
>> to exclude assembly code part for measurement, then we will be omitting major
>> contributor to CPU Idle latency namely ARM context save/restoration part.
>> >
>> > Also these calculations are done only when we enable CPUIDLE profiling feature.
>> > In the normal production system, these will not come into picture at all. So I am
>> not sure latencies involved in these calculations are still an issue >when we are just
>> doing profiling.
>>
>>
>> There are two other issues when we use 32k timer for latency measurement.
>>
>> <snip>
>> +
>> +       /* take care of overflow */
>> +       if (postidle_time < preidle_time)
>> +               postidle_time += (u32) 0xffffffff;
>> +       if (wkup_time < sleep_time)
>> +               wkup_time += (u32) 0xffffffff;
>> +
>> <snip>
>>
>> 1.We are checking postidle_time < preidle_time to find out whether
>> there had been an
>>    over flow or not. There can be situations in which the timer
>> overflows and still we have
>>    a greater postidle_time.
>>
>> 2. We are doing the correction for one overflow. What happens if the
>> timer overflows for
>>    a second or third time. Can we keep track of the number of
>> overflows and then do the
>>    correction accordingly?
>
> Unfortunately, there is no way to check if overflow happens more than once in 32k timer and as you said, theoretically it's possible >that if timer overflows more than once, these calculation will wrong. Having said that, do you really see any usecase where system >will idle for more than 37 hours in single cpuidle execution to cause timer overflow?

I am not sure. But can we completely write off such a possibility ?

Regards,
Silesh

>
> Vishwa
>>
>> Regards,
>> Silesh
>>
>> >
>> > Regards
>> > Vishwa
>> >>
>> >> Kevin
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
>> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-omap" in
>> > the body of a message to majordomo@xxxxxxxxxxxxxxx
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html