Re: [RFC] why do we need smp_rmb/smp_wmb pair in fd_install()/expand_fdtable()?

Christian Brauner <brauner@xxxxxxxxxx> · Thu, 8 Aug 2024 15:20:05 +0200

On Wed, Aug 07, 2024 at 08:06:31PM GMT, Linus Torvalds wrote:
> On Wed, 7 Aug 2024 at 19:50, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > What's the problem with droping both barriers and turning that
> > into
> >         expanded = expand_fdtable(files, nr);
> >         smp_store_release(&files->resize_in_progress, false);
> > and
> >         if (unlikely(smp_load_acquire(&files->resize_in_progress))) {
> >                 ....
> >                 return;
> >         }
> 
> That should be fine. smp_store_release()->smp_load_acquire() is the
> more modern model, and the better one. But I think we simply have a
> long history of using the old smp_wmb()->smp_rmb() model, so we have a
> lot of code that does that.
> 
> On x86, there's basically no difference - in all cases it ends up
> being just an instruction scheduling barrier.
> 
> On arm64, store_release->load_acquire is likely better, but obviously
> micro-architectural implementation issues might make it a wash.
> 
> On other architectures, there probably isn't a huge difference, but
> acquire/release can be more expensive if the architecture is
> explicitly designed for the old-style rmb/wmb model.
> 
> So on alpha, for example, store_release->load_acquire ends up being a
> full memory barrier in both cases (rmb is always a full memory barrier
> on alpha), which is hugely more expensive than wmb (well, again, in
> theory this is all obviously dependent on microarchitectures, but wmb
> in particular is very cheap unless the uarch really screwed the pooch
> and just messed up its barriers entirely).
> 
> End result: wmb/rmb is usually never _horrific_, while release/acquire
> can be rather expensive on bad machines.
> 
> But release/acquire is the RightThing(tm), and the fact that alpha
> based its ordering on the bad old model is not really our problem.
> 
> So I'm ok with just saying "screw bad memory orderings, go with the
> modern model"

So that's what confused me in your earlier mail in the other thread
where the question around smp_{r,w}mb() and smp_store_release() and
smp_load_acquire() already came up.

Basically, I had always used smp_load_acquire() and smp_store_release()
based on the assumption that they're equivalent to smp_{r,w}mb().

But then multiple times people brought up that supposedly smp_rmb() and
smp_wmb() are cheaper because they only do load or store ordering
whereas smp_{load,store}_{acquire,release}() do load and store ordering.

And it doesn't help that we seemingly don't have a practical guideline
of the form "Generally prefer smp_load_acquire() and smp_store_release()
over smp_rmb() and smp_wmb()." written down anywhere. That really would
shortcut decisions such as this.