在2024年10月29日十月 下午3:20,DiTBho Down in The Bunny hole写道: > hi > "Speculative Execution" is a feature of the R10000 Processor. > > I read that is problematic on machines that are Non-Cache > Coherent,such as the IP28 Indigo2 and on R10000/R12000-based IP32 O2 > systems. > Hi, > The first thing I don't understand is what does it mean that they are > "Non-Cache Coherent" systems. "Non-Cache Coherent" systems means that coherency is not maintained between CPU Caches and external Bus DMA. Which usually means cache writeback/invalidate needs to be performed by software before/after DMA requests. > > I mean, I know what coherence of the cache means in a multiprocessor > environment, but I'm a bit confused by what I read for these mono > processor systems. > > As far as I understand, in the R4k and R10k architecture "coherency > logic" should be on-chip of all the participating agents, both CPU(s) > and DMA masters, but I didn't understand if being a "cache coherent > system" depends only on a hw circuit implemented in the CPU (on-chip > -> inside the CPU) or if there is a need for circuits external to the > CPU (on-chip -> inside the bus controller, or something). I think it is yes. External circuit is required for a fully coherent R10k system. See R10k user manual section 2.1 Uniprocessor Systems, to quote: "If hardware I/O coherency is desired, the external agent may use the multiprocessor primitives provided by the processor to maintain cache coherency for interventions and invalidations. External duplicate tags can be used by the external agent to filter external coherency requests." In my interpretation, it means bus controller needs to send cache intervention request to CPU based on requests from other bus masters to actively maintain coherency. Please check "Table 6-12 Encoding of SysCmd[7:5] for External Request" as well for some implementation details. If you want some background reading, "A Primer on Memory Consistency and Cache Coherence" would be a good start point. > > a) SGI IP28/r10K -> not cache coherent, but uses R10k > b) SGI IP32/r10K -> not cache coherent, but uses R10k > c) SGI IP30/r10K -> cache coherent and uses R10k > > A few more details on the nature of Speculative Execution, and the > issues it poses to the Indigo2 can be found at the following URLs: > > 1) MIPS R10000 Microprocessor User's Manual (pages 51-55 ) > https://web.archive.org/web/20051028113506/http://techpubs.sgi.com/library/manuals/2000/007-2490-001/pdf/007-2490-001.pdf > > 2) Post to NetBSD sgimips Mailing List on 29 Jun 2000 > http://mail-index.netbsd.org/port-sgimips/2000/06/29/0006.html > > however there are no sw/hw examples. > > As far as I know, Linux never worked on O2/R10K, while, and it worked > years ago (20?) on IP28 only with patched gcc to force "cache barrier" > workarounds. > > I cannot find those patches, and I haven't yet understood the issue. > This is already being merged into upstream GCC. See `-mr10k-cache-barrier=setting` option. GCC documentation [1] has some explanation to the nature of the problem. > - > > I see that Linux, NetBSD and OpenBSD all work fine on IP30, even with > a couple of { R10K, R12K, R14K } CPUs! > > The R10K is documented as a four-way superscalar design that > implements register renaming and executes instructions out-of-order. > > I wonder if the problem is related to this "out-of-order" nature of > the CPU paired with the Branch Prediction and Speculative Execution > nature of a purist RISC design. Yes, it's related. GCC documentation explained that very well. > > Thinking about that, although one or more instructions may begin > execution during each cycle, and each instruction takes several or > many cycles to complete, when a branch instruction is decoded, its > branch condition may not yet be known. However, the R10000 processor > can predict whether the branch is taken, and then continue decoding > and executing subsequent instructions along the predicted path. > > When a branch prediction is wrong, the processor must back up to the > original branch and take the other path. This technique is called > "speculative execution", and whenever the processor discovers a > mispredicted branch, it aborts all speculatively-executed instructions > and restores the processor's state to the state it held before the > branch. > > However - the manual says - the cache state is not restored, and this > is clearly a side effect of speculative execution. > > Worse still, if the speculative approach involved a Conditional Store > (SC): will it be restored? No, because - the manual says - if the > cache is involved, then it won't be restored, so this is a real mess > that needs at least a sw barrier. I don't really know R10k implementation details, but IMHO since SC can only change a cacheline between two exclusive states, it doesn't matter that much.... > > (I can't find software examples, I would like to read them) > > I wonder ... is there any hw-mechanism with the IP30 (missing in IP28 > and IP32) that saves you when you are playing with cached-memory and > or LL/SC instructions in a conditional block (e.g. semaphore, mutex, > etc)? > > Can someone explain this matter to me? Thanks [1]: https://gcc.gnu.org/onlinedocs/gcc/MIPS-Options.html > > Thanks > > D. -- - Jiaxun