Re: [RFC PATCH] fs: elide the smp_rmb fence in fd_install()

Christian Brauner <brauner@xxxxxxxxxx> · Fri, 6 Dec 2024 13:11:58 +0100

On Thu, Dec 05, 2024 at 12:11:57PM -0800, Paul E. McKenney wrote:
> On Thu, Dec 05, 2024 at 08:47:24PM +0100, Mateusz Guzik wrote:
> > On Thu, Dec 5, 2024 at 8:26 PM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, 5 Dec 2024 at 10:41, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > > >
> > > > > To my understanding this is the idiomatic way of spelling out the
> > > > > non-existent in Linux smp_consume_load, for the resize_in_progress
> > > > > flag.
> > > >
> > > > In Linus, "smp_consume_load()" is named rcu_dereference().
> > >
> > > Linux.
> > >
> > > But yes and no.
> > >
> > > It's worth making it really really clear that "rcu_dereference()" is
> > > *not* just a different name for some "smp_consume_load()" operation.
> > >
> > > Why? Because a true smp_consume_load() would work with any random kind
> > > of flags etc. And rcu_dereference() works only because it's a pointer,
> > > and there's an inherent data dependency to what the result points to.
> > >
> > > Paul obviously knows this, but let's make it very clear in this
> > > discussion, because if somebody decided "I want a smp_consume_load(),
> > > and I'll use rcu_dereference() to do that", the end result would
> > > simply not work for arbitrary data, like a flags field or something,
> > > where comparing it against a value will only result in a control
> > > dependency, not an actual data dependency.
> > 
> > So I checked for kicks and rcu_dereference comes with type checking,
> > as in passing something which is not a pointer even fails to compile.
> 
> That is by design, keeping in mind that consume loads order only
> later dereferences against the pointer load.
> 
> > I'll note thought that a smp_load_consume_ptr or similarly named
> > routine would be nice and I'm rather confused why it was not added
> > given smp_load_acquire and smp_store_release being there.
> 
> In recent kernels, READ_ONCE() is your smp_load_consume_ptr().  Or things
> like rcu_dereference_check(p, 1).  But these can be used only when the
> pointed-to object is guaranteed to live (as in not be freed) for the
> full duration of the read-side use of that pointer.

Both true in the case of mnt_idmap(). All mounts start with mnt_idmap set to:
extern struct mnt_idmap nop_mnt_idmap which doesn't go away ever. And we
only allow to change the idmaping of a mount once. 
So the READ_ONCE() will always retrieve an object that is guaranteed to
stay alive for at least as long as the mount stays alive (in the
nop_mnt_idmap case obviously "forever").

I wanted to avoid a) pushing complicated RCU dances all through
filesystems and the VFS and b) taking any reference counts whatsoever on
mnt_idmap (other than sharing the same mnt_idmap between different
mounts at creation time). (Though I do have long-standing ideas how that
would work without changing the mnt_idmap pointer.).