On Mon, 10 Dec 2007, Richard Sandiford wrote: > Ralf Baechle <ralf@xxxxxxxxxxxxxx> writes: > >> Then there's the language-lawyerly code I gave to Peter on gcc-patches@: > >> > >> void foo (int x) > >> { > >> int array[1]; > >> if (x) > >> bar (array[0x1fff]); > >> } > >> A strange method to pass data... Of course, cooking up such an "ABI", where local variables are accessed with a const offset that is not known at compile-time to be valid, would subvert the test for $sp-based accesses... > >> This function is valid if x is never true, so we cannot assume that all > >> accesses off the stack and frame pointers are actually in-frame. You're > >> assuming either (i) the kernel doesn't use code like that or (ii) that > >> "garbage" addresses in the range [$sp - 0x8000, $sp + 0x7fff] will not > >> trigger the problem. I imagine both are reasonable assumptions, and I'm > >> perfectly happy for us to make them. But they're the kind of assumption > >> we need to state explicitly. > > > > Interesting test case. I've been thinking about it myself but in the end > > decieded to believe Peter's analysis since he's banged the head for longer > > to the wall about this problem that I have ;-) I'm quite but not absolutely > > certain that this case cannot happen for realworld code, so I'd rather > > err on the side of caution. > > > > Peter & Thomas - we could make the stack thing bullet proof by vmallocing > > stacks and ensuring a sufficient virtual address gap exists around the stack > > such that the stack is the only addressable thing in the range of > > $sp +0x7fff / -0x8000? ...but having an address-gap, virtual or by unused memory, should make it save even with such code. Typical "realworld" examples for speculative access to stack-variables are void foo (int x) void foo () { { int array[N]; int array[N], i; if (x < N) for (i = 0; i < N; i++) bar (array[x]); array[i] = 0; } } i.e. accesses with non-const offsets, i.e. no longer $sp-based, which always will trigger a CB. > > FWIW, my first cut at the option restrictions were based on what > the patch exempts (and doesn't exempt). We could instead get gcc > to only exempt accesses that it can prove are either to the current > function's stack frame or to its stack arguments. I.e. rather than > consider every $sp-based access to be safe, we'd instead do some "every $sp-based access" (set(mem(plus(sp)(const_int)))) is restricted to local variables too, with the constant offset being either - compiler-generated or - deliberately put in the source (however including the above example) > bounds checking on the value. Fine, if that is possible. > (We could also use MEM_ATTRS to > pick up cases where a stack variable is acceesed via something > other than the stack or frame pointers, as happens for large frames.) Aren't these always accesses with non-constant offset, where a CB can't be avoided, even if they are recognized as being actually relative to $sp ? > > >> Peter's patch also treated accesses to constant integer and symbolic > >> addresses as safe. Again, this involves making assumptions about how > >> constant integer and symbolic addresses are used, and this is a much > >> less obvious assumption than the stack one. > > > > The latter assumption is also needed for -msym32 kernels, so it's well > > proven to be valid. The former hold, too. > > > >> Again, I understand that > >> it's a reasonable assumption to make in the linux context, but it's one > >> we need to pin down. E.g. there must be no run-time guarding of > >> target-specific constant integer IO-mapped addresses in cases where > >> those addresses might trigger the problem on other systems that the > >> same kernel image supports. > > > > In case of a hypothetic multi-platform kernel of which at least one needs > > the R10000 workarounds, all code would be uniformly compiled with the > > magic -mr10k-cache-barrier option and all source level workaround would > > be enabled. > > Hmm. This probably shows I am misunderstanding the problem, but I was > thinking about the IO-mapped case. I thought one of the problems was > that if you had a cached speculative load or store to an access-sensitive > IO-mapped address, the IO-mapped device might "see" that access even if it > doesn't take place. Could you not have a situation where a KSEG0 or > XKSEG0 access is access-sensitive on one machine and not another? > The patch won't insert countermeasures before symbolic and constant > addresses, because it believes all such addresses to be safe. > The threat to IO-addresses comes from the addressing register in the speculated mem-instruction (set(mem(plus(reg)...), containing one of the addresses as "garbage". Symbolic addresses are well defined from link-time on, no matter what history before the access. They either point (set(mem(plus(symbol_ref)...) to - some variable in the cached area, what is harmless (unless DMA-related), or to - IO-devices, accessed uncached, i.e. non-speculative, unless there is a programming-error ;) The same holds for const_int used as address. If used for DMA and also directly accessed, symbolic addresses could be problematic though: extern char big_fat_dma_buffer[N]; if (!dma_running) *big_fat_dma_buffer = 0; else ... (However, as soon as accessed with non-constant offset - e.g. in a loop,... - the symbol_ref disappears from the mem-instruction which will trigger a CB) Btw. with 4.x symbolic addresses are practically (without backtrace analysis) not exempted from CBs, since they no longer show up in the mem-instruction and (set(mem(lo_sum(reg)... is seen instead. > I'm also a little worried that the compiler is free to make up accesses > that didn't exist in the original program, provided that those accesses The cache-barrier itself ? > are never actually performed in cases where they'd be wrong. So how about: > > -mr10k-cache-barrier=load-store > Insert a cache barrier at the beginning of any sequentially-executed > series of instructions that contains a load or store. For the purposes > of this option, GCC can ignore loads and stores that it can prove > are an in-range access to: > > (a) the current function's stack frame; > (b) an incoming stack argument; > (b) an object with a link-time-constant address; or > (c) a block of uncached memory Can we recognize uncached memory in the instruction ? > > It can also ignore sequences that are always immediately preceded by > an untaken branch-likely instruction. Fine! > > Here, a ``sequentially-executed series'' is one in which calls, > jumps and branches occur only as the last instruction. > > -mr10k-cache-barrier=store > Like -mr10k-cache-barrier=load-store, but ignore all loads. > > -mr10k-cache-barrier=none > ... > > Richard > > > kind regards peter