Re: Strange, strange occurence

Ralf Baechle <ralf@xxxxxxxxxxxxxx> · Thu, 15 Jul 2004 03:53:28 +0200

On Wed, Jul 14, 2004 at 05:35:19PM +0100, Dominic Sweetman wrote:

> If you use hit-type cache operations in a kernel routine, then you're
> safe.  I can't envisage any circumstance in which Linux would try to
> invalidate kernel mainline code locations from the I-cache (well, you
> might be doing something fabulous with debugging the kernel, but
> that's not normal and you'd hardly expect to be able to support such
> an activity with standard cache management calls).
> 
> So this problem can only arise on index-type I-cache invalidation.  I
> claim that a running kernel on a MIPS CPU should only use index-type
> invalidation when it is necessary to invalidate the entire I-cache.
> (If you use index-type operations for a range which doesn't resolve to
> "the whole cache" then that should be fixed).
> 
> That implies that a MIPS32-paranoid "invalidate-whole-I-cache" routine
> should:
> 
> 1. Identify which indexes might alias to cache lines 
>    containing the routines's own 'cache invalidate' instruction(s),
>    and thus hit the problem.  There won't be that many of them.
> 
> 2. Arrange to skip those indexes when zapping the cache, then do
>    something weird to invalidate that handful of lines.  You could 
>    do that by running uncached, but you could also do it just by using
>    some auxiliary routine which is known to be more than a cache line
>    but much less than a whole I-cache span distant, so can't possibly
>    alias to the same thing...
> 
> This is fiddly, but not terribly difficult and should have a
> negligible performance impact.
> 
> Does that make sense?  Am I now, having named the solution,
> responsible for figuring out a patch (yeuch, I never wanted to be a
> kernel programmer again...).

You don't have to :-)  What became a architectural restriction for MIPS32
did already show up earlier as an erratum for the TX49/H2 core.  This is
the solution which we currently have in the kernel code:

#define JUMP_TO_ALIGN(order) \
	__asm__ __volatile__( \
		"b\t1f\n\t" \
		".align\t" #order "\n\t" \
		"1:\n\t" \
		)
#define CACHE32_UNROLL32_ALIGN	JUMP_TO_ALIGN(10) /* 32 * 32 = 1024 */
#define CACHE32_UNROLL32_ALIGN2	JUMP_TO_ALIGN(11)

static inline void mips32_blast_icache32(void)
{
	unsigned long start = INDEX_BASE;
	unsigned long end = start + current_cpu_data.icache.waysize;
	unsigned long ws_inc = 1UL << current_cpu_data.icache.waybit;
	unsigned long ws_end = current_cpu_data.icache.ways <<
	                       current_cpu_data.icache.waybit;
	unsigned long ws, addr;

	CACHE32_UNROLL32_ALIGN2;
	/* I'm in even chunk.  blast odd chunks */
	for (ws = 0; ws < ws_end; ws += ws_inc) 
		for (addr = start + 0x400; addr < end; addr += 0x400 * 2) 
			cache32_unroll32(addr|ws,Index_Invalidate_I);
	CACHE32_UNROLL32_ALIGN;
	/* I'm in odd chunk.  blast even chunks */
	for (ws = 0; ws < ws_end; ws += ws_inc) 
		for (addr = start; addr < end; addr += 0x400 * 2) 
			cache32_unroll32(addr|ws,Index_Invalidate_I);
}

All it takes is using this for all MIPS32 / MIPS64 or maybe even all
processors and some tuning of constants to make this suitable for
all possible I-cache configurations.

  Ralf