On 06/02/2015 05:59, Ralf Baechle wrote: > On Tue, Jun 02, 2015 at 04:41:21AM -0400, Joshua Kinard wrote: > >> On 06/01/2015 20:09, Leonid Yegoshin wrote: >>> The following series implements lightweight SYNC memory barriers for SMP Linux >>> and a correct use of SYNCs around atomics, futexes, spinlocks etc LL-SC loops - >>> the basic building blocks of any atomics in MIPS. >>> >>> Historically, a generic MIPS doesn't use memory barriers around LL-SC loops in >>> atomics, spinlocks etc. However, Architecture documents never specify that LL-SC >>> loop creates a memory barrier. Some non-generic MIPS vendors already feel >>> the pain and enforces it. With introduction in a recent out-of-order superscalar >>> MIPS processors an aggressive speculative memory read it is a problem now. >>> >>> The generic MIPS memory barrier instruction SYNC (aka SYNC 0) is something >>> very heavvy because it was designed for propogating barrier down to memory. >>> MIPS R2 introduced lightweight SYNC instructions which correspond to smp_*() >>> set of SMP barriers. The description was very HW-specific and it was never >>> used, however, it is much less trouble for processor pipelines and can be used >>> in smp_mb()/smp_rmb()/smp_wmb() as is as in acquire/release barrier semantics. >>> After prolonged discussions with HW team it became clear that lightweight SYNCs >>> were designed specifically with smp_*() in mind but description is in timeline >>> ordering space. >>> >>> So, the problem was spotted recently in engineering tests and it was confirmed >>> with tests that without memory barrier load and store may pass LL/SC >>> instructions in both directions, even in old MIPS R2 processors. >>> Aggressive speculation in MIPS R6 and MIPS I5600 processors adds more fire to >>> this issue. >>> >>> 3 patches introduces a configurable control for lightweight SYNCs around LL/SC >>> loops and for MIPS32 R2 it was allowed to choose an enforcing SYNCs or not >>> (keep as is) because some old MIPS32 R2 may be happy without that SYNCs. >>> In MIPS R6 I chose to have SYNC around LL/SC mandatory because all of that >>> processors have an agressive speculation and delayed write buffers. In that >>> processors series it is still possible the use of SYNC 0 instead of >>> lightweight SYNCs in configuration - just in case of some trouble in >>> implementation in specific CPU. However, it is considered safe do not implement >>> some or any lightweight SYNC in specific core because Architecture requires >>> HW map of unimplemented SYNCs to SYNC 0. >> >> How useful might this be for older hardware, such as the R10k CPUs? Just >> fallbacks to the old sync insn? > > The R10000 family is strongly ordered so there is no SYNC instruction > required in the entire kernel even though some Origin hardware documentation > incorrectly claims otherwise. So no benefits even in the speculative execution case on noncoherent hardware like IP28 and IP32? --J