On Fri, Sep 27, 2024 at 03:20:40AM +0200, Mathieu Desnoyers wrote: > On 2024-09-26 18:12, Linus Torvalds wrote: > > On Thu, 26 Sept 2024 at 08:54, Jonas Oberhauser > > <jonas.oberhauser@xxxxxxxxxxxxxxx> wrote: > > > > > > No, the issue introduced by the compiler optimization (or by your > > > original patch) is that the CPU can speculatively load from the first > > > pointer as soon as it has completed the load of that pointer: > > > > You mean the compiler can do it. The inline asm has no impact on what > > the CPU does. The conditional isn't a barrier for the actual hardware. > > But once the compiler doesn't try to do it, the data dependency on the > > address does end up being an ordering constraint on the hardware too > > (I'm happy to say that I haven't heard from the crazies that want > > value prediction in a long time). > > > > Just use a barrier. Or make sure to use the proper ordered memory > > accesses when possible. Don't use an inline asm for the compare - we > > don't even have anything insane like that as a portable helper, and we > > shouldn't have it. > > How does the compiler barrier help in any way here ? > > I am concerned about the compiler SSA GVN (Global Value Numbering) > optimizations, and I don't think a compiler barrier solves anything. > (or I'm missing something obvious) I think you're right, a compiler barrier doesn't help here: head = READ_ONCE(p); smp_mb(); WRITE_ONCE(*slot, head); ptr = READ_ONCE(p); if (ptr != head) { ... } else { barrier(); return ptr; } compilers can replace 'ptr' with 'head' because of the equality, and even putting barrier() here cannot prevent compiler to rewrite the else branch into: else { barrier(); return head; } because that's just using a different register, unrelated to memory accesses. Jonas, am I missing something subtle? Or this is different than what you proposed? Regards, Boqun > > I was concerned about the suggestion from Jonas to use "node2" > rather than "node" after the equality check as a way to ensure > the intended register is used to return the pointer, because after > the SSA GVN optimization pass, AFAIU this won't help in any way. > I have a set of examples below that show gcc use the result of the > first load, and clang use the result of the second load (on > both x86-64 and aarch64). Likewise when a load-acquire is used as > second load, which I find odd. Hopefully mixing this optimization > from gcc with speculation still abide by the memory model. > > Only the asm goto approach ensures that gcc uses the result from > the second load. > [...]