On Fri, Apr 16, 2021 at 4:24 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > Simlar thing for RCU; C11 can't optimally do that; it needs to make > rcu_dereference() a load-acquire [something ARM64 has already done in C > because the compiler might be too clever by half when doing LTO :-(]. > But it's the compiler needing the acquire semantics, not the computer, > which is just bloody wrong. You may already know, but perhaps worth clarifying: C11 does have atomic_signal_fence() which is a compiler fence. But a compiler fence only ensures the loads will be emitted in the right order, not that the CPU will execute them in the right order. CPU architectures tend to guarantee that two loads will be executed in the right order if the second one's address depends on the first one's result, but a dependent load can stop being dependent after compiler optimizations involving value speculation. Using a load-acquire works around this, not because it stops the compiler from performing any optimization, but because it tells the computer to execute the loads in the right order *even if* the compiler has broken the value dependence. So C11 atomics don't make the situation worse, compared to Linux's atomics implementation based on volatile and inline assembly. Both are unsound in the presence of value speculation. C11 atomics were *supposed* to make the situation better, with memory_order_consume, which would have specifically forbidden the compiler from performing value speculation. But all the compilers punted on getting this to work and instead just implemented memory_order_consume as memory_order_acquire. As for Rust, it compiles to the same LLVM IR that Clang compiles C into. Volatile, inline assembly, and C11-based atomics: all of these are available in Rust, and generate exactly the same code as their C counterparts, for better or for worse. Unfortunately, the Rust project has relatively limited muscle when it comes to contributing to LLVM. So while it would definitely be nice if Rust could make RCU sound, and from a specification perspective I think people would be quite willing and probably easier to work with than the C committee... I suspect that implementing this would require the kind of sweeping change to LLVM that is probably not going to come from Rust. There are other areas where I think that kind of discussion might be more fruitful. For example, the Rust documentation currently says that a volatile read racing with a non-volatile write (i.e. seqlocks) is undefined behavior. [1] However, I am of the opinion that this is essentially a spec bug, for reasons that are probably not worth getting into here. [1] https://doc.rust-lang.org/nightly/std/ptr/fn.read_volatile.html