On Wed, Aug 07, 2024 at 08:06:31PM -0700, Linus Torvalds wrote: > That should be fine. smp_store_release()->smp_load_acquire() is the > more modern model, and the better one. But I think we simply have a > long history of using the old smp_wmb()->smp_rmb() model, so we have a > lot of code that does that. > > On x86, there's basically no difference - in all cases it ends up > being just an instruction scheduling barrier. > > On arm64, store_release->load_acquire is likely better, but obviously > micro-architectural implementation issues might make it a wash. > > On other architectures, there probably isn't a huge difference, but > acquire/release can be more expensive if the architecture is > explicitly designed for the old-style rmb/wmb model. > > So on alpha, for example, store_release->load_acquire ends up being a > full memory barrier in both cases (rmb is always a full memory barrier > on alpha), which is hugely more expensive than wmb (well, again, in > theory this is all obviously dependent on microarchitectures, but wmb > in particular is very cheap unless the uarch really screwed the pooch > and just messed up its barriers entirely). > > End result: wmb/rmb is usually never _horrific_, while release/acquire > can be rather expensive on bad machines. > > But release/acquire is the RightThing(tm), and the fact that alpha > based its ordering on the bad old model is not really our problem. alpha would have fuckloads of full barriers simply from all those READ_ONCE() in rcu reads... smp_rmb() is on the side that is much hotter - fd_install() vs. up to what, 25 calls of expand_fdtable() per files_struct instance history in the worst possible case? With rather big memcpy() done by those calls, at that...