On Sun, 9 Feb 2025 11:32:32 -0800 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Sun, 9 Feb 2025 at 11:10, David Laight <david.laight.linux@xxxxxxxxx> wrote: > > > > +#define barrier_nospec() __rmb() > > This is one of those "it happens to work, but it's wrong" things. > > Just make it explicit that it's "lfence" in the current implementation. Easily done. Any idea what the one used to synchronise rdtsc should be? 'lfence' is the right instruction (give or take), but it isn't a speculation issue. It really is 'wait for all memory accesses to finish' to give a sensible(ish) answer for cycle timing. And on old cpu you want nothing - not a locked memory access. > > Is __rmb() also an lfence? Yes. And that's actually very confusing too > too. Because on x86, a regular read barrier is a no-op, and the "main" > rmb definition is actually this: > > #define __dma_rmb() barrier() > #define __smp_rmb() dma_rmb() > > so that it's only a compiler barrier. I couldn't work out why __smp_mb() is so much stronger than the rmb() and wmb() forms - I presume the is history there I wasn't looking for. > And yes, __rmb() exists as the architecture-specific helper for "I > need to synchronize with unordered IO accesses" and is purely about > driver IO. I'd missed the history of it being IO related. ... > And some day in the future, maybe even that implementation equivalence > ends up going away again, and we end up with new barrier instructions > that depend on new CPU capabilities (or fake software capabilities: > kernel bootup flags that say "don't bother with the nospec > barriers"). Actually there is already the cpu flag to treat addresses with the top bit set as 'supervisor' in the initial address decode - rather that checking the page table in parallel with the d-cache accesses. When that hits real silicon then patching out the barrier_nospec() lfence would make sense. There is also your kernel build machine where you don't care. So compiling them out or boot patching them out is a real option. This does make it more clear that the rdtsc code has the wrong barrier. > So please keep the __rmb() and the barrier_nospec() separate, don't > tie them together. They just have *soo* many differences, both > conceptual and practical. A simple V2 :-) David > > Linus