SGI MIPS, Speculative Execution issue

DiTBho Down in The Bunny hole <downinthebunnyhole@xxxxxxxxx> · Tue, 29 Oct 2024 16:20:33 +0100

hi
"Speculative Execution"  is a feature of the R10000 Processor.

I read that is problematic on machines that are Non-Cache
Coherent,such as the IP28 Indigo2 and on R10000/R12000-based IP32 O2
systems.

The first thing I don't understand is what does it mean that they are
"Non-Cache Coherent" systems.

I mean, I know what coherence of the cache means in a multiprocessor
environment, but I'm a bit confused by what I read for these mono
processor systems.

As far as I understand, in the R4k and R10k architecture "coherency
logic" should be on-chip of all the participating agents, both CPU(s)
and DMA masters, but I didn't understand if being a "cache coherent
system" depends only on a hw circuit implemented in the CPU (on-chip
-> inside the CPU) or if there is a need for circuits external to the
CPU (on-chip -> inside the bus controller, or something).

a) SGI IP28/r10K -> not cache coherent, but uses R10k
b) SGI IP32/r10K -> not cache coherent, but uses R10k
c) SGI IP30/r10K -> cache coherent and uses R10k

A few more details on the nature of Speculative Execution, and the
issues it poses to the Indigo2 can be found at the following URLs:

1) MIPS R10000 Microprocessor User's Manual (pages 51-55 )
https://web.archive.org/web/20051028113506/http://techpubs.sgi.com/library/manuals/2000/007-2490-001/pdf/007-2490-001.pdf

2) Post to NetBSD sgimips Mailing List on 29 Jun 2000
http://mail-index.netbsd.org/port-sgimips/2000/06/29/0006.html

however there are no sw/hw examples.

As far as I know, Linux never worked on O2/R10K, while, and it worked
years ago (20?) on IP28 only with patched gcc to force "cache barrier"
workarounds.

I cannot find those patches, and I haven't yet understood the issue.

-

I see that Linux, NetBSD and OpenBSD all work fine on IP30, even with
a couple of { R10K, R12K, R14K } CPUs!

The R10K is documented as a four-way superscalar design that
implements register renaming and executes instructions out-of-order.

I wonder if the problem is related to this "out-of-order" nature of
the CPU paired with the Branch Prediction and Speculative Execution
nature of a purist RISC design.

Thinking about that, although one or more instructions may begin
execution during each cycle, and each instruction takes several or
many cycles to complete, when a branch instruction is decoded, its
branch condition may not yet be known. However, the R10000 processor
can predict whether the branch is taken, and then continue decoding
and executing subsequent instructions along the predicted path.

When a branch prediction is wrong, the processor must back up to the
original branch and take the other path. This technique is called
"speculative execution", and whenever the processor discovers a
mispredicted branch, it aborts all speculatively-executed instructions
and restores the processor's state to the state it held before the
branch.

However - the manual says - the cache state is not restored, and this
is clearly a side effect of speculative execution.

Worse still, if the speculative approach involved a Conditional Store
(SC): will it be restored? No, because - the manual says - if the
cache is involved, then it won't be restored, so this is a real mess
that needs at least a sw barrier.

(I can't find software examples, I would like to read them)

I wonder ... is there any hw-mechanism with the IP30 (missing in IP28
and IP32) that saves you when you are playing with cached-memory and
or LL/SC instructions in a conditional block (e.g. semaphore, mutex,
etc)?

Can someone explain this matter to me?

Thanks

D.