Am 9/26/2024 um 6:12 PM schrieb Linus Torvalds:
On Thu, 26 Sept 2024 at 08:54, Jonas Oberhauser
<jonas.oberhauser@xxxxxxxxxxxxxxx> wrote:
No, the issue introduced by the compiler optimization (or by your
original patch) is that the CPU can speculatively load from the first
pointer as soon as it has completed the load of that pointer:
You mean the compiler can do it.
What I mean is that if we only use rcu_dereference for the second load
(and not either some form of compiler barrier or an acquire load), then
the compiler can transform the second program from my previous e-mail
(which if mapped 1:1 to hardware would be correct because hardware
ensures the ordering based on the address dependency) into the first one
(which is incorrect).
In particular, the compiler can change
if (node == node2) t = *node2;
into
if (node == node2) t = *node;
and then the CPU can speculatively read *node before knowing the value
of node2.
The compiler can also speculatively read *node in this case, but that is
not what I meant.
The code in Mathieu's original patch is already like the latter one and
is broken even if the compiler does not do any optimizations.
The inline asm has no impact on what
the CPU does. The conditional isn't a barrier for the actual hardware.
But once the compiler doesn't try to do it, the data dependency on the
address does end up being an ordering constraint on the hardware too
Exactly. The inline asm would prevent the compiler from doing the
transformation though, which would mean that the address dependency
appears in the final compiler output.
Just use a barrier. Or make sure to use the proper ordered memory
accesses when possible.
>
Don't use an inline asm for the compare - we
don't even have anything insane like that as a portable helper, and we
shouldn't have it.
I'm glad you say that :))
I would also just use a barrier before returing the pointer.
Boqun seems to be unhappy with a barrier though, because it would
theoretically also forbid unrelated optimizations.
But I have not seen any evidence that there are any unrelated
optimizations going on in the first place that would be forbidden by this.
Have fun,
jonas