Re: PM related performance degradation on OMAP3

Kevin Hilman <khilman@xxxxxx> · Tue, 17 Apr 2012 17:36:06 -0700

Grazvydas Ignotas <notasas@xxxxxxxxx> writes:

> On Tue, Apr 17, 2012 at 5:30 PM, Kevin Hilman <khilman@xxxxxx> wrote:
>> Grazvydas Ignotas <notasas@xxxxxxxxx> writes:
>>>
>>> Ok I did some tests, all in mostly idle system with just init, busybox
>>> shell and dd doing a NAND read to /dev/null .
>>
>> Hmm, I seem to get a hang using dd to read from NAND /dev/mtdX on my
>> Overo.  I saw your patch 'mtd: omap2: fix resource leak in prefetch-busy
>> path' but that didn't seem to help my crash.

[...]

> Also only pandora is using NAND DMA mode right now in mainline, the
> default polling mode won't exhibit the latency problem (with all other
> polling consequences like high CPU usage), so this is needed too for
> the test:

Yeah, I noticed that today when I discovered my dd tests weren't causing
any DMA interrupts. ;) I switched Overo to use DMA mode by copy/paste
the pdata from Pandora board file, and now it's working fine, and I'm
seeing throughput similar to yours.

> I also forgot to mention I was using ubifs in my test (dd'ing large
> file from it), I don't think it has much effect, but if you want to
> try with that:

[...]

I'm just dd'ing raw bytes from /dev/mtdX to /dev/null, so the format
shouldn't matter I guess.

>>> To me it looks like this results from many small things adding up..
>>> Idle is called so often that pwrdm_p*_transition() and those
>>> pwrdm_for_each_clkdm() walks start slowing everything down, perhaps
>>> because they access lots of registers on slow buses?
>>
>> Yes PRCM register accesses are unfortunately rather slow, and we've
>> known that for some time, but haven't done any detailed analysis of the
>> overhead.
>>
>> Using the function_graph tracer, I was able to see that the pre/post
>> transition are taking an enormous amount of time:
>>
>>  - pwrdm pre-transition: 1400+ us at 600MHz (4000+ us at 125MHz)
>>  - pwrdm post-transtion: 1600+ us at 600MHz (6000+ us at 125MHz)
>
> Hmm, with this it wouldn't be able to do ~500+ calls/sec I was seeing,
> so the tracer overhead is probably quite large too..

Yes, tracer overhead is important there, but it still shows me who the
biggest contributors are to the overhead/delay.

>> Notice the big difference between 600MHz OPP and 125MHz OPP.  Are you
>> using CPUfreq at all in your tests?  If using cpufreq + ondemand
>> governor, you're probably running at low OPP due to lack of CPU activity
>> which will also affect the latencies in the idle path.
>
> I used performance governor in my tests, so it all was at 600MHz.

OK, good.

Kevin

>> I'm looking into this in more detail know, and will likely have a few
>> patches for you to experiment with.
>
> Sounds good,
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html