Re: [PATCH v3] Avoid memory barrier in read_seqcount() through load acquire

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 18 Sept 2024 at 13:15, Christoph Lameter (Ampere) <cl@xxxxxxxxxx> wrote:
>
> Other arches do not have acquire / release and will create additional
> barriers in the fallback implementation of smp_load_acquire. So it needs
> to be an arch config option.

Actually, I looked at a few cases, and it doesn't really seem to be true.

For example, powerpc doesn't have a "native" acquire model, but both
smp_load_acquire() and smp_rmb() end up being  LWSYNC after the load
(which in the good case is a "lwsync" instruction, in bad case it's a
heavier "sync" instruction on older cores, but the point is that it's
the same thing for smp_rmb() and for smp_load_acquire()).

So on powerpc, smp_load_acquire() isn't any better than
"READ_ONCE()+smp_rmb()", but it also isn't any worse.

And at least alpha is the same - it doesn't have smp_load_acquire(),
and it falls back on a full memory barrier for that case - but that's
what smp_rmb() is too. However, because READ_ONCE() on alpha already
contains a smp_mb(), it turns out that on alpha having "READ_ONCE +
smp_rmb()" actually results in *two* barriers, while a
"smp_load_acquire()" is just one.

And obviously technically x86 doesn't have explicit acquire, but with
every load being an acquire, it's a no-op either way.

So on at least three very different architectures, smp_load_acquire()
is at least no worse than READ_ONCE() followed by a smp_rmb(). And on
alpha and arm64, it's better.

So it does look like making it conditional doesn't actually buy us
anything. We might as well just unconditionally use the
smp_load_acquire() over "READ_ONCE+smp_rmb".

Other random architectures from a quick look:

RISC-V technically turns smp_rmb() into a "fence r,r", while a
smp_load_acquire() ends up being a "fence r,rw", so technically the
fences are different. But honestly, any microarchitecture that makes
those two be different is just crazy garbage (there's never any valid
reason to move later writes up before earlier reads).

Loongarch has acquire and is better off with it.

parisc has acquire and is better off with it.

s390 and sparc64 are like x86, in that it's just a build barrier either way.

End result: let's just simplify the patch and make it entirely unconditional.

                 Linus




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux