Re: [PATCH 0/2] fix vma->anon_vma check for per-VMA locking; fix anon_vma memory ordering

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxx> · Thu, 27 Jul 2023 10:11:29 -0700

On Thu, 27 Jul 2023 at 08:44, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
>
> This reads a little oddly, perhaps because it's a fragment from a larger
> piece of code.

Yes. As Jann already said, this is basically a preparatory step in a
much longer sequence, and the code simply wants to make sure that any
later code (possibly quite a bit later) will not see a NULL value.

I do believe it happens to be safe to use READ_ONCE() for a number of
reasons, but I will argue that we should *never* use a bare READ_ONCE
if there is any actual real question about what any memory ordering
might be.

And yes, the RCU code obviously does use READ_ONCE(), so that's why I
say "bare" - the RCU code wraps it in helper accessors with strict
semantics.

The reason I think it would be techncially *safe* to do here is that

 (a) code like this:

        if (READ_ONCE(..))

     may end up having the conditional speculated by the CPU, and have
the actual read done without any ordering by the CPU, but we do have
*one* guarantee: the memory load instruction will be before the
conditional branch (or whatever) in the instruction stream.

     So the code may be *executed* out of order, but we know the
memory load can not be moved after the conditional (whatever form that
conditional then takes) by a compiler.

     (We do have various barriers like "rcu_read_unlock()" that
guarantees that the READ_ONCE() couldn't be moved lmuch later even in
the absence of the conditional, but we can ignore that).

 (b) the later use of the anon_vma (that depends on the value being
stable) is *so* much later, and behind things that the compiler sees
as barriers (again, that rcu_read_unlock() acts at a minimum as a
compiler barrier) that any subsequent use would not have its load
moved down to before the READ_ONCE() in the instruction stream.

     Again, this is purely a "location in the instruction stream"
ordering argument, not a "execution order" ordering argument.

And finally

 (c) I do think that all our architectures have the rules that when
you read from the *same* memory location from the same CPU, the
accesses are ordered.

Now, I didn't actually find (c) spelled out anywhere, but even alpha -
the worst memory ordering ever devised - had that particular rule (the
alpha architecture manual calls it "Location Access Order").

Now, with that said, I did argue to Jann that we should use
smp_store_release and smp_load_acquire pairs here, because honestly,
the above (a)-(c) argument is too damn subtle for me, and I don't
think this particular use requires it.

With smp_load_acquire(), you don't need any of the (a)-(c) rules. The
rule is "the load is done before any subsequent memory operations".
End of story.

So while I do think READ_ONCE() is sufficient here, I actually think
that once you start going down that path you can argue that
READ_ONCE() is actually entirely unnecessary, because we also have
other ordering rules that mean that the compiler can't really do
anythinig else even *without* the READ_ONCE().

End result: I can trivially extend the (a)-(c) to say "even
READ_ONCE() isn't strictly necessary here, because even any access
tearing - which won't happen anyway - wouldn't actually change the
result.

So if we want to make it *obvious* that it's safe, we should use
smp_load_acquire().

And if we do find that there are situations where we care so much
about the (generally fairly cheap) possible additional
synchronization, and we really want to use READ_ONCE() rather than
smp_load_acquire(), I'd rather try to wrap a READ_ONCE in some kind of
better abstraction (maybe make the conditional part of the operation,
and make it clear that you are doing a "one-time test which depends on
the same-location rule".

Do we even have the same-location rule in the LKMM?

                       Linus