HI Kevin, Grazvydas, On Tue, Apr 24, 2012 at 4:29 PM, Kevin Hilman <khilman@xxxxxx> wrote: > Jean Pihet <jean.pihet@xxxxxxxxxxxxxx> writes: > >> Hi Grazvydas, Kevin, >> >> I did some gather some performance measurements and statistics using >> custom tracepoints in __omap3_enter_idle. I posted the patches for the power domains registers cache, cf. http://marc.info/?l=linux-omap&m=133587781712039&w=2. >> All the details are at >> http://www.omappedia.org/wiki/Power_Management_Device_Latencies_Measurement#C1_performance_problem:_analysis I updated the page with the measurements results with Kevin's patches and the registers cache patches. The results are showing that: - the registers cache optimizes the low power mode transitions, but is not sufficient to obtain a big gain. A few unused domains are transitioning, which causes a big penalty in the idle path. - khilman's optimizations are really helpful. Furthermore it optimizes farther the registers cache statistics accesses. - the average time in idle now drops to 246us, which is still very large for a cpu intensive C-state. For information with PM disabled the average time in idle is 113us. Regards, Jean >> . > > This is great, thanks. > > [...] > >> Here are the results (BW in MB/s) on Beagleboard: >> - 4.7: without using DMA, >> >> - Using DMA >> 2.1: [0] >> 2.1: [1] only C1 >> 2.6: [1]+[2] no pre_ post_ >> 2.3: [1]+[5] no pwrdm_for_each_clkdm >> 2.8: [1]+[5]+[2] >> 3.1: [1]+[5]+[6] no omap_sram_idle >> 3.1: No IDLE, no omap_sram_idle, all pwrdms to ON >> >> So indeed this shows there is some serious performance issue with the >> C1 C-state. > > Yes, this confirms what both Grazvytas and I are seeing as well. > > [...] > >> From the list of contributors, the main ones are: >> (140us) pwrdm_pre_transition and pwrdm_post_transition, > > See the series I just posted to address this one: > [PATCH/RFT 0/3] ARM: OMAP: PM: reduce overhead of pwrdm pre/post transitions > >> (105us) omap2_gpio_prepare_for_idle and >> omap2_gpio_resume_after_idle. This could be avoided if PER stays ON in >> the latency-critical C-states, >> (78us) pwrdm_for_each_clkdm(mpu, core, deny_idle/allow_idle), >> (33us estimated) omap_set_pwrdm_state(mpu, core, neon), >> (11 us) clkdm_allow_idle(mpu). Is this needed? > > In that same series, I removed this as it appears to be a remnant of a > code move (c.f. patch 3 in above series.) > >> Here are a few questions and suggestions: >> - In case of latency critical C-states could the high-latency code be >> bypassed in favor of a much simpler version? Pushing the concept a bit >> farther one could have a C1 state that just relaxes the cpu (no WFI), >> a C2 state which bypasses a lot of code in __omap3_enter_idle, and the >> rest of the C-states as we have today, > > I was thinking a "WFI only" state, with *all* powerdomains staying on is > probably sufficient for C1. Do you see the enter/exit latency from that > as even being too hight? > >> - Is it needed to iterate through all the power and clock domains in >> order to keep them active? > > No. My series above starts to addresses this, but I think Tero's > use-counting series is the final solution since this should really be > done when we know the powerdomains are transitioning. > > Kevin -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html