On Tue, Jun 08, 2021 at 11:30:36AM +0200, Marco Elver wrote: > The cleaner approach would be an expression wrapper, e.g. "if > (ctrl_depends(A) && B) { ... }". > > I imagine syntactically it'd be similar to __builtin_expect(..). I > think that's also easier to request an extension for, say > __builtin_ctrl_depends(expr). (If that is appealing, we can try and > propose it as std::ctrl_depends() along with std::dependent_ptr<>.) > > Thoughts? Works for me; and note how it mirrors how we implemented volatile_if() in the first place, by doing an expression wrapper. __builtin_ctrl_depends(expr) would have to: - ensure !__builtin_const_p(expr) (A) - imply an acquire compiler fence (B) - ensure cond-branch is emitted (C) *OR* - ensure !__builtin_const_p(expr); (A) - upgrade the load in @expr to load-acquire (D) A) This all hinges on there actually being a LOAD, if expr is constant, we have a malformed program and can emit a compiler error. B) We want to capture any store, not just volatile stores that come after. The example here is a ring-buffer that loads the (head and) tail pointer to check for space and then writes data elements. It would be 'cumbersome' to have all the data writes as volatile. C) We depend on the load-to-branch data dependency to guard the store to provide the LOAD->STORE memory order. D) Upgrading LOAD to LOAD-ACQUIRE also provides LOAD->STORE ordering, but it does require that the compiler has access to the LOAD in the first place, which isn't a given seeing how much asm() we have around. Also the achitecture should have a sheep LOAD-ACQUIRE in the first place, otherwise there's no point. If this is done, the branch is allowed to be optimized away if the compiler so wants. Now, Will will also want to allow the load-acquire to be run-time patched between the RCsc and RCpc variant depending on what ARMv8 extentions are available, which will be 'interesting' (although I can think of ways to actually do that, one would be to keep a special section that tracks the location of these __builtin_ctrl_depends() generated load-acquire instruction).