On Wed, Jun 20, 2018 at 11:31:55AM +0800, 陈华才 wrote: > Loongson-3's Store Fill Buffer is nearly the same as your "Store Buffer", > and it increases the memory ordering weakness. So, smp_cond_load_acquire() > only need a __smp_mb() before the loop, not after every READ_ONCE(). In > other word, the following code is just OK: > > #define smp_cond_load_acquire(ptr, cond_expr) \ > ({ \ > typeof(ptr) __PTR = (ptr); \ > typeof(*ptr) VAL; \ > __smp_mb(); \ > for (;;) { \ > VAL = READ_ONCE(*__PTR); \ > if (cond_expr) \ > break; \ > cpu_relax(); \ > } \ > __smp_mb(); \ > VAL; \ > }) > > the __smp_mb() before loop is used to avoid "reads prioritised over > writes", which is caused by SFB's weak ordering and similar to ARM11MPCore > (mentioned by Will Deacon). Sure, but smp_cond_load_acquire() isn't the only place you'll see this sort of pattern in the kernel. In other places, the only existing arch hook is cpu_relax(), so unless you want to audit all loops and add a special MIPs-specific smp_mb() to those that are affected, I think your only option is to stick it in cpu_relax(). I assume you don't have a control register that can disable this prioritisation in the SFB? Will