On Sat, Mar 23, 2024 at 05:40:23PM -0400, comex wrote: > That may be true, but the LLVM issue you cited isn’t a good example. > In that issue, the function being miscompiled doesn’t actually use any > barriers or atomics itself; only the scaffolding around it does. The > same issue would happen even if the scaffolding used LKMM atomics. > > For anyone curious: The problematic optimization involves an > allocation (‘p’) that is initially private to the function, but is > returned at the end of the function. LLVM moves a non-atomic store to > that allocation across an external function call (to ‘foo’). This > reordering would be blatantly invalid if any other code could observe > the contents of the allocation, but is valid if the allocation is > private to the function. LLVM assumes the latter: after all, the > pointer to it hasn’t escaped. Yet. Except that in a weak memory > model, the escape can ‘time travel’... It's hard to understand exactly what you mean, but consider the following example: int *globalptr; int x; int *f() { int *p = kzalloc(sizeof(int)); L1: *p = 1; L2: foo(); return p; } void foo() { smp_store_release(&x, 2); } void thread0() { WRITE_ONCE(globalptr, f()); } void thread1() { int m, n; int *q; m = smp_load_acquire(&x); q = READ_ONCE(globalptr); if (m && q) n = *q; } (If you like, pretend each of these function definitions lives in a different source file -- it doesn't matter.) With no optimization, whenever thread1() reads *q it will always obtain 1, thanks to the store-release in foo() and the load-acquire() in thread1(). But if the compiler swaps L1 and L2 in f() then this is not guaranteed. On a weakly ordered architecture, thread1() could then get 0 from *q. I don't know if this is what you meant by "in a weak memory model, the escape can ‘time travel'". Regardless, it seems very clear that any compiler which swaps L1 and L2 in f() has a genuine bug. Alan Stern