Re: [PATCH 0/3] MIPS: SMP memory barriers: lightweight sync, acquire-release

Joshua Kinard <kumba@xxxxxxxxxx> · Tue, 02 Jun 2015 04:41:21 -0400

On 06/01/2015 20:09, Leonid Yegoshin wrote:
> The following series implements lightweight SYNC memory barriers for SMP Linux
> and a correct use of SYNCs around atomics, futexes, spinlocks etc LL-SC loops -
> the basic building blocks of any atomics in MIPS.
> 
> Historically, a generic MIPS doesn't use memory barriers around LL-SC loops in
> atomics, spinlocks etc. However, Architecture documents never specify that LL-SC
> loop creates a memory barrier. Some non-generic MIPS vendors already feel
> the pain and enforces it. With introduction in a recent out-of-order superscalar
> MIPS processors an aggressive speculative memory read it is a problem now.
> 
> The generic MIPS memory barrier instruction SYNC (aka SYNC 0) is something
> very heavvy because it was designed for propogating barrier down to memory.
> MIPS R2 introduced lightweight SYNC instructions which correspond to smp_*()
> set of SMP barriers. The description was very HW-specific and it was never
> used, however, it is much less trouble for processor pipelines and can be used
> in smp_mb()/smp_rmb()/smp_wmb() as is as in acquire/release barrier semantics.
> After prolonged discussions with HW team it became clear that lightweight SYNCs
> were designed specifically with smp_*() in mind but description is in timeline
> ordering space.
> 
> So, the problem was spotted recently in engineering tests and it was confirmed
> with tests that without memory barrier load and store may pass LL/SC
> instructions in both directions, even in old MIPS R2 processors.
> Aggressive speculation in MIPS R6 and MIPS I5600 processors adds more fire to
> this issue.
> 
> 3 patches introduces a configurable control for lightweight SYNCs around LL/SC
> loops and for MIPS32 R2 it was allowed to choose an enforcing SYNCs or not
> (keep as is) because some old MIPS32 R2 may be happy without that SYNCs.
> In MIPS R6 I chose to have SYNC around LL/SC mandatory because all of that
> processors have an agressive speculation and delayed write buffers. In that
> processors series it is still possible the use of SYNC 0 instead of
> lightweight SYNCs in configuration - just in case of some trouble in
> implementation in specific CPU. However, it is considered safe do not implement
> some or any lightweight SYNC in specific core because Architecture requires
> HW map of unimplemented SYNCs to SYNC 0.

How useful might this be for older hardware, such as the R10k CPUs?  Just
fallbacks to the old sync insn?

--J