Hi Peter, On Fri, Nov 19, 2010 at 10:57 AM, Peter 'p2' De Schrijver <peter.de-schrijver@xxxxxxxxx> wrote: > On Fri, Nov 19, 2010 at 10:46:19AM +0100, ext Jean Pihet wrote: >> On Fri, Nov 19, 2010 at 2:54 AM, Nishanth Menon <nm@xxxxxx> wrote: >> > From: Richard Woodruff <r-woodruff2@xxxxxx> >> > >> > Analysis in TI kernel with ETM showed that using cache mapped flush >> > in kernel instead of SO mapped flush cost drops by 65% (3.39mS down >> > to 1.17mS) for clean_l2 which is used during sleep sequences. >> > Overall: >> > - speed up >> > - unfortunately there isn't a good alternative flush method today >> > - code reduction and less maintenance and potential bug in >> > unmaintained code >> > >> > This also fixes the bug with the clean_l2 function usage. >> > >> > Reported-by: Tony Lindgren <tony@xxxxxxxxxxx> >> > >> > [nm@xxxxxx: ported rkw's proposal to 2.6.37-rc2] >> > Signed-off-by: Nishanth Menon <nm@xxxxxx> >> > Signed-off-by: Richard Woodruff <r-woodruff2@xxxxxx> >> > --- >> > >> > Side note: just dcache needs to be flushed based on inputs from TI internal team >> > >> > arch/arm/mach-omap2/sleep34xx.S | 79 ++++++-------------------------------- >> > 1 files changed, 13 insertions(+), 66 deletions(-) >> > >> > diff --git a/arch/arm/mach-omap2/sleep34xx.S b/arch/arm/mach-omap2/sleep34xx.S >> > index 2fb205a..8f207b2 100644 >> > --- a/arch/arm/mach-omap2/sleep34xx.S >> > +++ b/arch/arm/mach-omap2/sleep34xx.S >> > @@ -520,72 +520,17 @@ clean_caches: >> > cmp r9, #1 /* Check whether L2 inval is required or not*/ >> > bne skip_l2_inval >> > clean_l2: >> > - /* read clidr */ >> > - mrc p15, 1, r0, c0, c0, 1 >> > - /* extract loc from clidr */ >> > - ands r3, r0, #0x7000000 >> > - /* left align loc bit field */ >> > - mov r3, r3, lsr #23 >> > - /* if loc is 0, then no need to clean */ >> > - beq finished >> > - /* start clean at cache level 0 */ >> > - mov r10, #0 >> > -loop1: >> > - /* work out 3x current cache level */ >> > - add r2, r10, r10, lsr #1 >> > - /* extract cache type bits from clidr*/ >> > - mov r1, r0, lsr r2 >> > - /* mask of the bits for current cache only */ >> > - and r1, r1, #7 >> > - /* see what cache we have at this level */ >> > - cmp r1, #2 >> > - /* skip if no cache, or just i-cache */ >> > - blt skip >> > - /* select current cache level in cssr */ >> > - mcr p15, 2, r10, c0, c0, 0 >> > - /* isb to sych the new cssr&csidr */ >> > - isb >> > - /* read the new csidr */ >> > - mrc p15, 1, r1, c0, c0, 0 >> > - /* extract the length of the cache lines */ >> > - and r2, r1, #7 >> > - /* add 4 (line length offset) */ >> > - add r2, r2, #4 >> > - ldr r4, assoc_mask >> > - /* find maximum number on the way size */ >> > - ands r4, r4, r1, lsr #3 >> > - /* find bit position of way size increment */ >> > - clz r5, r4 >> > - ldr r7, numset_mask >> > - /* extract max number of the index size*/ >> > - ands r7, r7, r1, lsr #13 >> > -loop2: >> > - mov r9, r4 >> > - /* create working copy of max way size*/ >> > -loop3: >> > - /* factor way and cache number into r11 */ >> > - orr r11, r10, r9, lsl r5 >> > - /* factor index number into r11 */ >> > - orr r11, r11, r7, lsl r2 >> > - /*clean & invalidate by set/way */ >> > - mcr p15, 0, r11, c7, c10, 2 >> > - /* decrement the way*/ >> > - subs r9, r9, #1 >> > - bge loop3 >> > - /*decrement the index */ >> > - subs r7, r7, #1 >> > - bge loop2 >> > -skip: >> > - add r10, r10, #2 >> > - /* increment cache number */ >> > - cmp r3, r10 >> > - bgt loop1 >> > -finished: >> > - /*swith back to cache level 0 */ >> > - mov r10, #0 >> > - /* select current cache level in cssr */ >> > - mcr p15, 2, r10, c0, c0, 0 >> > - isb >> > + /* >> > + * jump out to kernel flush routine >> > + * - resue that code is better >> Typo: 'reuse' >> >> > + * - it executes in a cached space so is faster than refetch per-block >> > + * - should be faster and will change with kernel >> > + * - 'might' have to copy address, load and jump to it >> > + */ >> > + ldr r1, kernel_flush >> > + mov lr, pc >> > + bx r1 >> It is simpler and more efficient to use: >> bl v7_flush_dcache_all > > This doesn't work from SRAM though, because the linker will generate a > PC relative branch which is wrong if the code is moved to SRAM at > runtime. So the original version needs to stay :) Correct! My version now runs from DDR, this explains that! > > Cheers, > > Peter. > Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html