Re: [UPDATED PATCH] IP28 support

peter fuerst <post@xxxxxxxx> · Sun, 23 Dec 2007 02:44:01 +0100 (CET)

On Wed, 12 Dec 2007, Richard Sandiford wrote:

> Date: Wed, 12 Dec 2007 18:09:31 +0000
> From: Richard Sandiford <rsandifo@xxxxxxxxxxxxx>
> To: peter fuerst <pf@xxxxxxxx>
> Cc: Ralf Baechle <ralf@xxxxxxxxxxxxxx>,
>      Thomas Bogendoerfer <tsbogend@xxxxxxxxxxxxxxxx>,
>      Kumba <kumba@xxxxxxxxxx>, linux-mips@xxxxxxxxxxxxxx
> Subject: Re: [UPDATED PATCH] IP28 support
>
> peter fuerst <pf@xxxxxxxx> writes:
> >> Ralf Baechle <ralf@xxxxxxxxxxxxxx> writes:
> >> >> Then there's the language-lawyerly code I gave to Peter on gcc-patches@:
> >> >>
> >> >>      void foo (int x)
> >> >>      {
> >> >>        int array[1];
> >> >>        if (x)
> >> >>          bar (array[0x1fff]);
> >> >>      }
> >> >>
> >
> > A strange method to pass data... Of course, cooking up such an "ABI",
> > where local variables are accessed with a const offset that is not known at
> > compile-time to be valid, would subvert the test for $sp-based accesses...
>
> Well, as I said when I gave that example originally, it's unlikely that
> the example would be written in that form.  But hide the constants and
> checks in configurable macros, and the general idea becomes a little
> more feasible.
>
> >> FWIW, my first cut at the option restrictions were based on what
> >> the patch exempts (and doesn't exempt).  We could instead get gcc
> >> to only exempt accesses that it can prove are either to the current
> >> function's stack frame or to its stack arguments.  I.e. rather than
> >> consider every $sp-based access to be safe, we'd instead do some
> >
> > "every $sp-based access" (set(mem(plus(sp)(const_int)))) is restricted
> > to local variables too, with the constant offset being either
> > - compiler-generated or
> > - deliberately put in the source (however including the above example)
>
> That's not literally true.  SP+INT addresses can be used to access
> stack arguments too, and 4.x can optimise some varargs accesses to

"local variables" was meant to include (var-)argument-slots too, which are
allright, so far.

> compile-time base+offset addresses.  And as I said, the compiler is
> free to make up accesses that aren't in fact valid for cases where
> the access isn't made.  E.g. if you had a loop with a stride of 128,
> the compiler could unroll the loop as many times as it likes.  Some
> of the unrolled iterations might access areas outside the stack frame.

You mean, the compiler would generate $sp+const_int accesses here, whose
validity is not known at compile-time - similar to foo() above ?

> (You would hope that the compiler would be intelligent enough to crop
> the iteration count in such cases, because the extra iterations should
> never be used in valid code.  But that isn't the point.  The compiler
> doesn't _need_ to crop the iteration count for correctness, and we're
> talking about something we _do_ need for correctness.)
>
> >> bounds checking on the value.
> > Fine, if that is possible.
>
> FWIW, the frame info is available in cfun->machine->frame at the time
> your code runs.
>
> >>                                (We could also use MEM_ATTRS to
> >> pick up cases where a stack variable is acceesed via something
> >> other than the stack or frame pointers, as happens for large frames.)
> >
> > Aren't these always accesses with non-constant offset, where a CB can't be
> > avoided, even if they are recognized as being actually relative to $sp ?
>
> The MEM_ATTRS I meant were MEM_EXPR + MEM_OFFSET, which only apply where
> there is a known constant offset.
>
> >> > In case of a hypothetic multi-platform kernel of which at least one needs
> >> > the R10000 workarounds, all code would be uniformly compiled with the
> >> > magic -mr10k-cache-barrier option and all source level workaround would
> >> > be enabled.
> >>
> >> Hmm.  This probably shows I am misunderstanding the problem, but I was
> >> thinking about the IO-mapped case.  I thought one of the problems was
> >> that if you had a cached speculative load or store to an access-sensitive
> >> IO-mapped address, the IO-mapped device might "see" that access even if it
> >> doesn't take place.  Could you not have a situation where a KSEG0 or
> >> XKSEG0 access is access-sensitive on one machine and not another?
> >> The patch won't insert countermeasures before symbolic and constant
> >> addresses, because it believes all such addresses to be safe.
> >>
> >
> > The threat to IO-addresses comes from the addressing register in the speculated
> > mem-instruction (set(mem(plus(reg)...), containing one of the addresses as
> > "garbage".
> >
> > Symbolic addresses are well defined from link-time on, no matter what history
> > before the access.  They either point (set(mem(plus(symbol_ref)...) to
> > - some variable in the cached area, what is harmless (unless DMA-related),
> >   or to
> > - IO-devices, accessed uncached, i.e. non-speculative,
> > unless there is a programming-error ;)
> > The same holds for const_int used as address.
>
> I think you're missing my point.  If you access an I/O-mapped device
> through KSEG2 or an uncached XKPHYS address, is it not also physically
> possible (though clearly unwise) to access it through KSEG0 or a cached
> XKPHYS address too?  So can you guarantee that every const_int cached
> address in a multi-platform kernel is not I/O-mapped on any of the r10k
> platforms?  Or can you guarantee that the compiler will not manufacture
> such an address from an otherwise harmless address?
Hmm, it's not quite clear, how it could be manufactured.
>                                                     Again, the key thing
> is to think about what the compiler can validly do on non-r10k platforms,
> however silly it might seem, and then make sure the workarounds cope
> with it.

Do you think of accesses that essentially look like this ?

  if (machine_x)
     *uncached(const_addr) = val;
  else
     *cached(const_addr) = val;

Fortunately (at least? even?) on IP28 cached access (hence a block read
request) to an I/O-device address is a non-issue. In this respect the
hardware design seems to follow the recommendations from the R10000 manual
(NACK from external agent?):
- if such an access graduates (i.e. a "real" access), a bus-error will occur.
- if not (i.e. mis-speculated), nothing happens.

However, i don't yet know, how O2 behaves, or, if there exists any other
R10k-machine, which would need the software-workaround.

>
> >> I'm also a little worried that the compiler is free to make up accesses
> >> that didn't exist in the original program, provided that those accesses
> > The cache-barrier itself ?
>
> No, in general.  Optimisers (particularly loop optimisers) can invent
> accesses that didn't exist in the original source code.  Normally they
> would only be executed in correct circumstances, but with this
> speculative execution, that might not be true.
>
> >> are never actually performed in cases where they'd be wrong.  So how about:
> >>
> >> -mr10k-cache-barrier=load-store
> >>   Insert a cache barrier at the beginning of any sequentially-executed
> >>   series of instructions that contains a load or store.  For the purposes
> >>   of this option, GCC can ignore loads and stores that it can prove
> >>   are an in-range access to:
> >>
> >>   (a) the current function's stack frame;
> >>   (b) an incoming stack argument;
> >>   (b) an object with a link-time-constant address; or
> >>   (c) a block of uncached memory
> > Can we recognize uncached memory in the instruction ?
>
> Well, I was just thinking about teaching the compiler about KSEG2,
> the always-uncached XKPHYS addresses, etc.  (Sorry for messing up
> the bullet letters there!)  The idea is that we have a correlation
> between symbolic constants and C objects, so we can check whether
> an offset in a symbolic constant is within the object.  We already

No doubt, this would be very helpfull.

> have code to do this in other situations.  But there is no correlation
> between const_int addresses and C objects, and we cannot be sure that
> a given const_int address existed in the original source code, so
> I think the only safe thing is to check its uncached properties instead.
>
> I know all this must be frustrating.  I'm sure your patches work great
> as they are with current and past kernels, and current and past compilers.
> The problem is that, if it becomes a mainline gcc feature, it needs to be
> defined from first principles.

Agreed, the design of any feature advantageously should be based on a clear
(more or less formal) specification of what the compiler can do.

>                                 And we need to do that without assuming
> that the accesses we're looking at existed in the original source code.
>
> FWIW, I'm happy to help update the patch once we've agreed on an

This would be appreciated (of course :) Many thanks in advance!

> option spec.

Well, the option spec could be as listed above. With "store" as default
for an empty option-string ("none" as default if the option isn't given
at all).

>
> Richard
>
>

kind regards

peter

PS: apologies for delaying the answer, i just couldn't concentrate on
this topic recently.