Re: [PATCH 3/5] ARM: cache-armv7: start invalidation from outer levels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Dienstag, den 23.04.2019, 19:21 +0200 schrieb Ahmad Fatoum:
> On 4/23/19 7:18 PM, Ahmad Fatoum wrote:
> > The ARM Cortex-A Series Programmer's Guide notes[1]:
> > > Some care is required with cache maintenance activity in multi-core
> 
> Unfortunately, the only error scenario I could find in the ARM docs
> was this one here, which presumes a multi-core system.
> 
> Lucas, do you have a better example in context of barebox on how things
> could go awry without this patch?

In the Barebox case we don't care about SMP, but about the HW
prefetcher, which is an independent master and isn't bound by the
instructions executed on the core.

So the sequence that could go wrong in Barebox is as follows:
1. CPU core starts invalidating at L1 cache level
2. HW prefetcher decides that a specific address should be brought into
the L1 cache.
3. HW prefetcher finds a valid block for the requested address in L2
cache and moves cached data from L2 to L1.
4. Only now CPU core invalidates L2 cache.

In the above sequence we now have invalid data in the L1 cache line.
The correct sequence will avoid this issue:

1. CPU core starts invalidating at L2 cache level
2. HW prefetcher decides that a specific address should be brought into
the L1 cache.
3. HW prefetcher sees invalid tags for the requested address in L2
cache and brings in the data from external memory.
4. CPU core invalidates L1 cache, discarding the prefetched data.

With the correct sequence we never end up with invalid data in valid
cache lines, we just waste some external memory bandwidth on the
discarded prefetch.

Regards,
Lucas

> 
> > > systems that include a L2C-310 L2 cache (or similar). Cleaning or
> > > invalidating the L1 cache and L2 cache will not be a single atomic
> > > operation. A core might therefore perform cache maintenance on
> > > a particular address in both L1 and L2 caches only as two discrete
> > > steps. If another core were to access the affected address between
> > > those two actions, a coherency problem can occur. Such problems can
> > > be avoided by following two simple rules.
> > > 
> > > * When cleaning, always clean the innermost (L1) cache first and then
> > >   clean the outer cache(s).
> > > * When invalidating, always invalidate the outermost cache first and
> > >   the L1 cache last.
> > 
> > The current code correctly iterates from inner to outer cache levels
> > when flushing/cleaning (r8 == 0), invalidation (r8 == 1) occurs in the
> > same direction though. Adjust the invalidation iteration order to start
> > from the outermost cache instead.
> > 
> > Equivalent C-Code:
> > 
> > 	enum cache_op { CACHE_FLUSH = 0, CACHE_INVAL = 1 };
> > 	register enum cache_op operation asm("r8");
> > 	register int i asm("r12");
> > 	register int limit asm("r3") = max_cache_level << 1; // e.g. 4 with L2 max
> > 
> > 	+if (operation == CACHE_FLUSH) {
> > > > 	 	i = 0;
> > 	+} else {
> > > > 	+	i = limit - 2;
> > 	+}
> > 
> > 	 bool loop_again;
> > 	 do {
> > > > 		/* [snip] */
> > > > 	+	if (operation == CACHE_FLUSH) {
> > > > 			i += 2;
> > > > 			loop_again = limit > i;
> > > > 	+	} else {
> > > > 	+		loop_again = i > 0;
> > > > 	+		i -= 2;
> > > > 	+	}
> > 	} while (loop_again);
> > 
> > [1]: 18.6 "TLB and cache maintenance broadcast", Version 4.0
> > 
> > > > Suggested-by: Lucas Stach <l.stach@xxxxxxxxxxxxxx>
> > > > Signed-off-by: Ahmad Fatoum <a.fatoum@xxxxxxxxxxxxxx>
> > ---
> >  arch/arm/cpu/cache-armv7.S | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/cpu/cache-armv7.S b/arch/arm/cpu/cache-armv7.S
> > index 0eb0ecfee756..b2d70bece7b9 100644
> > --- a/arch/arm/cpu/cache-armv7.S
> > +++ b/arch/arm/cpu/cache-armv7.S
> > @@ -83,7 +83,10 @@ hierarchical:
> > > > > > > >  		ands	r3, r0, #0x7000000	@ extract loc from clidr
> > > > > > > >  		mov	r3, r3, lsr #23		@ left align loc bit field
> > > > > > > >  		beq	finished		@ if loc is 0, then no need to clean
> > > > > > > > -		mov	r12, #0			@ start clean at cache level 0
> > > > > > +		cmp	r8, #0
> > > > > > > > +THUMB(		ite	eq			)
> > > > > > +		moveq	r12, #0
> > > > > > > > +		subne	r12, r3, #2		@ start invalidate at outmost cache level
> >  loop1:
> > > > > > > >  		add	r2, r12, r12, lsr #1	@ work out 3x current cache level
> > > > > > > >  		mov	r1, r0, lsr r2		@ extract cache type bits from clidr
> > > > > > > > @@ -118,8 +121,15 @@ THUMB(		ite	eq			)
> > > > > > > >  		subs	r7, r7, #1		@ decrement the index
> > > > > >  		bge	loop2
> >  skip:
> > > > > > +		cmp	r8, #0
> > > > > > +		bne	inval_check
> > > > > > > >  		add	r12, r12, #2		@ increment cache number
> > > > > >  		cmp	r3, r12
> > > > > > +		b	loop_end_check
> > +inval_check:
> > > > > > +		cmp	r12, #0
> > > > > > > > +		sub	r12, r12, #2		@ decrement cache number
> > +loop_end_check:
> >  #ifdef CONFIG_ARM_ERRATA_814220
> > > >  		dsb
> >  #endif
> > 
> 
> 

_______________________________________________
barebox mailing list
barebox@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/barebox




[Index of Archives]     [Linux Embedded]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux