On Fri, Jun 04, 2021 at 03:56:16PM +0200, Peter Zijlstra wrote: > On Fri, Jun 04, 2021 at 02:44:22PM +0100, Will Deacon wrote: > > On Fri, Jun 04, 2021 at 01:31:48PM +0200, Peter Zijlstra wrote: > > > On Fri, Jun 04, 2021 at 11:44:00AM +0100, Will Deacon wrote: > > > > On Fri, Jun 04, 2021 at 12:12:07PM +0200, Peter Zijlstra wrote: > > > > > > > > Usage of volatile_if requires the @cond to be headed by a volatile load > > > > > (READ_ONCE() / atomic_read() etc..) such that the compiler is forced to > > > > > emit the load and the branch emitted will have the required > > > > > data-dependency. Furthermore, volatile_if() is a compiler barrier, which > > > > > should prohibit the compiler from lifting anything out of the selection > > > > > statement. > > > > > > > > When building with LTO on arm64, we already upgrade READ_ONCE() to an RCpc > > > > acquire. In this case, it would be really good to avoid having the dummy > > > > conditional branch somehow, but I can't see a good way to achieve that. > > > > > > #ifdef CONFIG_LTO > > > /* Because __READ_ONCE() is load-acquire */ > > > #define volatile_cond(cond) (cond) > > > #else > > > .... > > > #endif > > > > > > Doesn't work? Bit naf, but I'm thinking it ought to do. > > > > The problem is with relaxed atomic RMWs; we don't upgrade those to acquire > > atm as they're written in asm, but we'd need volatile_cond() to work with > > them. It's a shame, because we only have RCsc RMWs on arm64, so it would > > be a bit more expensive. > > Urgh, I see. Compiler can't really help in that case either I'm afraid. > They'll never want to modify loads that originate in an asm(). They'll > say to use the C11 _Atomic crud. Indeed. That's partly what led me down the route of thinking about "control ordering" to sit between relaxed and acquire. So you have READ_ONCE_CTRL() instead of this, but then we can't play your asm goto trick. If we could push the memory access _and_ the branch down into the new volatile_if helper, a bit like we do for smp_cond_load_*(), that would help but it makes the thing a lot harder to use. In fact, maybe it's actually necessary to bundle the load and branch together. I looked at some of the examples of compilers breaking control dependencies from memory-barriers.txt and the "boolean short-circuit" example seems to defeat volatile_if: void foo(int *x, int *y) { volatile_if (READ_ONCE(*x) || 1 > 0) WRITE_ONCE(*y, 42); } Although we get a conditional branch emitted, it's headed by an immediate move instruction and the result of the load is discarded: 38: d503233f paciasp 3c: b940001f ldr wzr, [x0] 40: 52800028 mov w8, #0x1 // #1 44: b5000068 cbnz x8, 50 <foo+0x18> 48: d50323bf autiasp 4c: d65f03c0 ret 50: d503249f bti j 54: 52800548 mov w8, #0x2a // #42 58: b9000028 str w8, [x1] 5c: d50323bf autiasp 60: d65f03c0 ret Will