Re: [RFC] why do we need smp_rmb/smp_wmb pair in fd_install()/expand_fdtable()?

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Thu, 8 Aug 2024 04:35:05 +0100

On Wed, Aug 07, 2024 at 08:06:31PM -0700, Linus Torvalds wrote:

> That should be fine. smp_store_release()->smp_load_acquire() is the
> more modern model, and the better one. But I think we simply have a
> long history of using the old smp_wmb()->smp_rmb() model, so we have a
> lot of code that does that.
> 
> On x86, there's basically no difference - in all cases it ends up
> being just an instruction scheduling barrier.
> 
> On arm64, store_release->load_acquire is likely better, but obviously
> micro-architectural implementation issues might make it a wash.
> 
> On other architectures, there probably isn't a huge difference, but
> acquire/release can be more expensive if the architecture is
> explicitly designed for the old-style rmb/wmb model.
> 
> So on alpha, for example, store_release->load_acquire ends up being a
> full memory barrier in both cases (rmb is always a full memory barrier
> on alpha), which is hugely more expensive than wmb (well, again, in
> theory this is all obviously dependent on microarchitectures, but wmb
> in particular is very cheap unless the uarch really screwed the pooch
> and just messed up its barriers entirely).
> 
> End result: wmb/rmb is usually never _horrific_, while release/acquire
> can be rather expensive on bad machines.
> 
> But release/acquire is the RightThing(tm), and the fact that alpha
> based its ordering on the bad old model is not really our problem.

alpha would have fuckloads of full barriers simply from all those READ_ONCE()
in rcu reads...

smp_rmb() is on the side that is much hotter - fd_install() vs. up to what, 25 calls
of expand_fdtable() per files_struct instance history in the worst possible case?
With rather big memcpy() done by those calls, at that...