On 06/01/2015 20:09, Leonid Yegoshin wrote: > The following series implements lightweight SYNC memory barriers for SMP Linux > and a correct use of SYNCs around atomics, futexes, spinlocks etc LL-SC loops - > the basic building blocks of any atomics in MIPS. > > Historically, a generic MIPS doesn't use memory barriers around LL-SC loops in > atomics, spinlocks etc. However, Architecture documents never specify that LL-SC > loop creates a memory barrier. Some non-generic MIPS vendors already feel > the pain and enforces it. With introduction in a recent out-of-order superscalar > MIPS processors an aggressive speculative memory read it is a problem now. > > The generic MIPS memory barrier instruction SYNC (aka SYNC 0) is something > very heavvy because it was designed for propogating barrier down to memory. > MIPS R2 introduced lightweight SYNC instructions which correspond to smp_*() > set of SMP barriers. The description was very HW-specific and it was never > used, however, it is much less trouble for processor pipelines and can be used > in smp_mb()/smp_rmb()/smp_wmb() as is as in acquire/release barrier semantics. > After prolonged discussions with HW team it became clear that lightweight SYNCs > were designed specifically with smp_*() in mind but description is in timeline > ordering space. > > So, the problem was spotted recently in engineering tests and it was confirmed > with tests that without memory barrier load and store may pass LL/SC > instructions in both directions, even in old MIPS R2 processors. > Aggressive speculation in MIPS R6 and MIPS I5600 processors adds more fire to > this issue. > > 3 patches introduces a configurable control for lightweight SYNCs around LL/SC > loops and for MIPS32 R2 it was allowed to choose an enforcing SYNCs or not > (keep as is) because some old MIPS32 R2 may be happy without that SYNCs. > In MIPS R6 I chose to have SYNC around LL/SC mandatory because all of that > processors have an agressive speculation and delayed write buffers. In that > processors series it is still possible the use of SYNC 0 instead of > lightweight SYNCs in configuration - just in case of some trouble in > implementation in specific CPU. However, it is considered safe do not implement > some or any lightweight SYNC in specific core because Architecture requires > HW map of unimplemented SYNCs to SYNC 0. How useful might this be for older hardware, such as the R10k CPUs? Just fallbacks to the old sync insn? --J