On Wed, Aug 07, 2024 at 08:06:31PM GMT, Linus Torvalds wrote: > On Wed, 7 Aug 2024 at 19:50, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > What's the problem with droping both barriers and turning that > > into > > expanded = expand_fdtable(files, nr); > > smp_store_release(&files->resize_in_progress, false); > > and > > if (unlikely(smp_load_acquire(&files->resize_in_progress))) { > > .... > > return; > > } > > That should be fine. smp_store_release()->smp_load_acquire() is the > more modern model, and the better one. But I think we simply have a > long history of using the old smp_wmb()->smp_rmb() model, so we have a > lot of code that does that. > > On x86, there's basically no difference - in all cases it ends up > being just an instruction scheduling barrier. > > On arm64, store_release->load_acquire is likely better, but obviously > micro-architectural implementation issues might make it a wash. > > On other architectures, there probably isn't a huge difference, but > acquire/release can be more expensive if the architecture is > explicitly designed for the old-style rmb/wmb model. > > So on alpha, for example, store_release->load_acquire ends up being a > full memory barrier in both cases (rmb is always a full memory barrier > on alpha), which is hugely more expensive than wmb (well, again, in > theory this is all obviously dependent on microarchitectures, but wmb > in particular is very cheap unless the uarch really screwed the pooch > and just messed up its barriers entirely). > > End result: wmb/rmb is usually never _horrific_, while release/acquire > can be rather expensive on bad machines. > > But release/acquire is the RightThing(tm), and the fact that alpha > based its ordering on the bad old model is not really our problem. > > So I'm ok with just saying "screw bad memory orderings, go with the > modern model" So that's what confused me in your earlier mail in the other thread where the question around smp_{r,w}mb() and smp_store_release() and smp_load_acquire() already came up. Basically, I had always used smp_load_acquire() and smp_store_release() based on the assumption that they're equivalent to smp_{r,w}mb(). But then multiple times people brought up that supposedly smp_rmb() and smp_wmb() are cheaper because they only do load or store ordering whereas smp_{load,store}_{acquire,release}() do load and store ordering. And it doesn't help that we seemingly don't have a practical guideline of the form "Generally prefer smp_load_acquire() and smp_store_release() over smp_rmb() and smp_wmb()." written down anywhere. That really would shortcut decisions such as this.