Am Dienstag, den 23.04.2019, 19:21 +0200 schrieb Ahmad Fatoum: > On 4/23/19 7:18 PM, Ahmad Fatoum wrote: > > The ARM Cortex-A Series Programmer's Guide notes[1]: > > > Some care is required with cache maintenance activity in multi-core > > Unfortunately, the only error scenario I could find in the ARM docs > was this one here, which presumes a multi-core system. > > Lucas, do you have a better example in context of barebox on how things > could go awry without this patch? In the Barebox case we don't care about SMP, but about the HW prefetcher, which is an independent master and isn't bound by the instructions executed on the core. So the sequence that could go wrong in Barebox is as follows: 1. CPU core starts invalidating at L1 cache level 2. HW prefetcher decides that a specific address should be brought into the L1 cache. 3. HW prefetcher finds a valid block for the requested address in L2 cache and moves cached data from L2 to L1. 4. Only now CPU core invalidates L2 cache. In the above sequence we now have invalid data in the L1 cache line. The correct sequence will avoid this issue: 1. CPU core starts invalidating at L2 cache level 2. HW prefetcher decides that a specific address should be brought into the L1 cache. 3. HW prefetcher sees invalid tags for the requested address in L2 cache and brings in the data from external memory. 4. CPU core invalidates L1 cache, discarding the prefetched data. With the correct sequence we never end up with invalid data in valid cache lines, we just waste some external memory bandwidth on the discarded prefetch. Regards, Lucas > > > > systems that include a L2C-310 L2 cache (or similar). Cleaning or > > > invalidating the L1 cache and L2 cache will not be a single atomic > > > operation. A core might therefore perform cache maintenance on > > > a particular address in both L1 and L2 caches only as two discrete > > > steps. If another core were to access the affected address between > > > those two actions, a coherency problem can occur. Such problems can > > > be avoided by following two simple rules. > > > > > > * When cleaning, always clean the innermost (L1) cache first and then > > > clean the outer cache(s). > > > * When invalidating, always invalidate the outermost cache first and > > > the L1 cache last. > > > > The current code correctly iterates from inner to outer cache levels > > when flushing/cleaning (r8 == 0), invalidation (r8 == 1) occurs in the > > same direction though. Adjust the invalidation iteration order to start > > from the outermost cache instead. > > > > Equivalent C-Code: > > > > enum cache_op { CACHE_FLUSH = 0, CACHE_INVAL = 1 }; > > register enum cache_op operation asm("r8"); > > register int i asm("r12"); > > register int limit asm("r3") = max_cache_level << 1; // e.g. 4 with L2 max > > > > +if (operation == CACHE_FLUSH) { > > > > i = 0; > > +} else { > > > > + i = limit - 2; > > +} > > > > bool loop_again; > > do { > > > > /* [snip] */ > > > > + if (operation == CACHE_FLUSH) { > > > > i += 2; > > > > loop_again = limit > i; > > > > + } else { > > > > + loop_again = i > 0; > > > > + i -= 2; > > > > + } > > } while (loop_again); > > > > [1]: 18.6 "TLB and cache maintenance broadcast", Version 4.0 > > > > > > Suggested-by: Lucas Stach <l.stach@xxxxxxxxxxxxxx> > > > > Signed-off-by: Ahmad Fatoum <a.fatoum@xxxxxxxxxxxxxx> > > --- > > arch/arm/cpu/cache-armv7.S | 12 +++++++++++- > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm/cpu/cache-armv7.S b/arch/arm/cpu/cache-armv7.S > > index 0eb0ecfee756..b2d70bece7b9 100644 > > --- a/arch/arm/cpu/cache-armv7.S > > +++ b/arch/arm/cpu/cache-armv7.S > > @@ -83,7 +83,10 @@ hierarchical: > > > > > > > > ands r3, r0, #0x7000000 @ extract loc from clidr > > > > > > > > mov r3, r3, lsr #23 @ left align loc bit field > > > > > > > > beq finished @ if loc is 0, then no need to clean > > > > > > > > - mov r12, #0 @ start clean at cache level 0 > > > > > > + cmp r8, #0 > > > > > > > > +THUMB( ite eq ) > > > > > > + moveq r12, #0 > > > > > > > > + subne r12, r3, #2 @ start invalidate at outmost cache level > > loop1: > > > > > > > > add r2, r12, r12, lsr #1 @ work out 3x current cache level > > > > > > > > mov r1, r0, lsr r2 @ extract cache type bits from clidr > > > > > > > > @@ -118,8 +121,15 @@ THUMB( ite eq ) > > > > > > > > subs r7, r7, #1 @ decrement the index > > > > > > bge loop2 > > skip: > > > > > > + cmp r8, #0 > > > > > > + bne inval_check > > > > > > > > add r12, r12, #2 @ increment cache number > > > > > > cmp r3, r12 > > > > > > + b loop_end_check > > +inval_check: > > > > > > + cmp r12, #0 > > > > > > > > + sub r12, r12, #2 @ decrement cache number > > +loop_end_check: > > #ifdef CONFIG_ARM_ERRATA_814220 > > > > dsb > > #endif > > > > _______________________________________________ barebox mailing list barebox@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/barebox