On Thu, Apr 6, 2017 at 12:23 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > Something like so then. According to the SDM mwait is a no-op if we do > not execute monitor first. So this variant should get the first > iteration without expensive instructions. No, the problem is that we *would* have executed a prior monitor that could still be pending - from a previous invocation of smp_cond_load_acquire(). Especially with spinlocks, these things can very much happen back-to-back. And it would be pending with a different address (the previous spinlock) that might not have changed since then (and might not be changing), so now we might actually be pausing in mwait waiting for that *other* thing to change. So it would probably need to do something complicated like #define smp_cond_load_acquire(ptr, cond_expr) \ ({ \ typeof(ptr) __PTR = (ptr); \ typeof(*ptr) VAL; \ do { \ VAL = READ_ONCE(*__PTR); \ if (cond_expr) \ break; \ for (;;) { \ ___monitor(__PTR, 0, 0); \ VAL = READ_ONCE(*__PTR); \ if (cond_expr) break; \ ___mwait(0xf0 /* C0 */, 0); \ } \ } while (0) \ smp_acquire__after_ctrl_dep(); \ VAL; \ }) which might just generate nasty enough code to not be worth it. I dunno Linus