On Thu, Aug 8, 2024 at 5:46 AM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Aug 08, 2024 at 04:35:05AM +0100, Al Viro wrote: > > On Wed, Aug 07, 2024 at 08:06:31PM -0700, Linus Torvalds wrote: > > > > But release/acquire is the RightThing(tm), and the fact that alpha > > > based its ordering on the bad old model is not really our problem. > > > > alpha would have fuckloads of full barriers simply from all those READ_ONCE() > > in rcu reads... > > > > smp_rmb() is on the side that is much hotter - fd_install() vs. up to what, 25 calls > > of expand_fdtable() per files_struct instance history in the worst possible case? > > With rather big memcpy() done by those calls, at that... > > BTW, an alternative would be to have LSB of ->fdt (or ->fd, if we try to > eliminate that extra dereference) for ->resize_in_progress. Then no barrier > is needed for ordering of those. Would cost an extra &~1 on ->fdt fetches, > though... Note smp_load_acquire still emits a fenced instruction on arm64, but I have no idea what the cost is. While my understanding of RCU guarantees in face of synchronize_rcu is rather limited here, I do suspect the entire thing can be handled with a consume fence, which does expand to a regular load on everything but alpha. So the question to Paul is if given this in expand_fdtable: files->resize_in_progress = true; .... if (atomic_read(&files->count) > 1) synchronize_rcu(); does something like this work for fd_install: rcu_read_lock_sched(); files = smp_load_consume(current->files); if (unlikely(files->resize_in_progress)) .... fdt = rcu_dereference_sched(files->fdt); rcu_assign_pointer(fdt->fd[fd], file); rcu_read_unlock_sched(); -- Mateusz Guzik <mjguzik gmail.com>