Re: PM related performance degradation on OMAP3

Kevin Hilman <khilman@xxxxxx> · Tue, 24 Apr 2012 07:29:37 -0700

Jean Pihet <jean.pihet@xxxxxxxxxxxxxx> writes:

> Hi Grazvydas, Kevin,
>
> I did some gather some performance measurements and statistics using
> custom tracepoints in __omap3_enter_idle.
> All the details are at
> http://www.omappedia.org/wiki/Power_Management_Device_Latencies_Measurement#C1_performance_problem:_analysis
> .

This is great, thanks.

[...]

> Here are the results (BW in MB/s) on Beagleboard:
> - 4.7: without using DMA,
>
> - Using DMA
>   2.1: [0]
>   2.1: [1] only C1
>   2.6: [1]+[2] no pre_ post_
>   2.3: [1]+[5] no pwrdm_for_each_clkdm
>   2.8: [1]+[5]+[2]
>   3.1: [1]+[5]+[6] no omap_sram_idle
>   3.1: No IDLE, no omap_sram_idle, all pwrdms to ON
>
> So indeed this shows there is some serious performance issue with the
> C1 C-state.

Yes, this confirms what both Grazvytas and I are seeing as well.

[...]

> From the list of contributors, the main ones are:
>     (140us) pwrdm_pre_transition and pwrdm_post_transition,

See the series I just posted to address this one:
[PATCH/RFT 0/3] ARM: OMAP: PM: reduce overhead of pwrdm pre/post transitions

>     (105us) omap2_gpio_prepare_for_idle and
> omap2_gpio_resume_after_idle. This could be avoided if PER stays ON in
> the latency-critical C-states,
>     (78us) pwrdm_for_each_clkdm(mpu, core, deny_idle/allow_idle),
>     (33us estimated) omap_set_pwrdm_state(mpu, core, neon),
>     (11 us) clkdm_allow_idle(mpu). Is this needed?

In that same series, I removed this as it appears to be a remnant of a
code move (c.f. patch 3 in above series.)

> Here are a few questions and suggestions:
> - In case of latency critical C-states could the high-latency code be
> bypassed in favor of a much simpler version? Pushing the concept a bit
> farther one could have a C1 state that just relaxes the cpu (no WFI),
> a C2 state which bypasses a lot of code in __omap3_enter_idle, and the
> rest of the C-states as we have today,

I was thinking a "WFI only" state, with *all* powerdomains staying on is
probably sufficient for C1.  Do you see the enter/exit latency from that
as even being too hight?

> - Is it needed to iterate through all the power and clock domains in
> order to keep them active?

No.  My series above starts to addresses this, but I think Tero's
use-counting series is the final solution since this should really be
done when we know the powerdomains are transitioning.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html