Re: [RFC] why do we need smp_rmb/smp_wmb pair in fd_install()/expand_fdtable()?

Mateusz Guzik <mjguzik@xxxxxxxxx> · Thu, 8 Aug 2024 08:08:00 +0200

On Thu, Aug 8, 2024 at 5:46 AM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Aug 08, 2024 at 04:35:05AM +0100, Al Viro wrote:
> > On Wed, Aug 07, 2024 at 08:06:31PM -0700, Linus Torvalds wrote:
>
> > > But release/acquire is the RightThing(tm), and the fact that alpha
> > > based its ordering on the bad old model is not really our problem.
> >
> > alpha would have fuckloads of full barriers simply from all those READ_ONCE()
> > in rcu reads...
> >
> > smp_rmb() is on the side that is much hotter - fd_install() vs. up to what, 25 calls
> > of expand_fdtable() per files_struct instance history in the worst possible case?
> > With rather big memcpy() done by those calls, at that...
>
> BTW, an alternative would be to have LSB of ->fdt (or ->fd, if we try to
> eliminate that extra dereference) for ->resize_in_progress.  Then no barrier
> is needed for ordering of those.  Would cost an extra &~1 on ->fdt fetches,
> though...

Note smp_load_acquire still emits a fenced instruction on arm64, but I
have no idea what the cost is.

While my understanding of RCU guarantees in face of synchronize_rcu is
rather limited here, I do suspect the entire thing can be handled with
a consume fence, which does expand to a regular load on everything but
alpha.

So the question to Paul is if given this in expand_fdtable:
         files->resize_in_progress = true;
         ....
        if (atomic_read(&files->count) > 1)
                synchronize_rcu();

does something like this work for fd_install:

        rcu_read_lock_sched();
        files = smp_load_consume(current->files);
        if (unlikely(files->resize_in_progress))
                ....
        fdt = rcu_dereference_sched(files->fdt);
        rcu_assign_pointer(fdt->fd[fd], file);
        rcu_read_unlock_sched();

-- 
Mateusz Guzik <mjguzik gmail.com>