On Wed, 04 Sep 2024, Chuck Lever III wrote: > > > > On Sep 3, 2024, at 11:29 AM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > > > > On Tue, Sep 03, 2024 at 11:19:45AM -0400, Jeff Layton wrote: > >> On Tue, 2024-09-03 at 11:00 -0400, Mike Snitzer wrote: > >>> On Tue, Sep 03, 2024 at 10:40:28AM -0400, Jeff Layton wrote: > >>>> On Tue, 2024-09-03 at 10:34 -0400, Chuck Lever wrote: > >>>>> On Sat, Aug 31, 2024 at 06:37:36PM -0400, Mike Snitzer wrote: > >>>>>> From: Weston Andros Adamson <dros@xxxxxxxxxxxxxxx> > >>>>>> > >>>>>> Add server support for bypassing NFS for localhost reads, writes, and > >>>>>> commits. This is only useful when both the client and server are > >>>>>> running on the same host. > >>>>>> > >>>>>> If nfsd_open_local_fh() fails then the NFS client will both retry and > >>>>>> fallback to normal network-based read, write and commit operations if > >>>>>> localio is no longer supported. > >>>>>> > >>>>>> Care is taken to ensure the same NFS security mechanisms are used > >>>>>> (authentication, etc) regardless of whether localio or regular NFS > >>>>>> access is used. The auth_domain established as part of the traditional > >>>>>> NFS client access to the NFS server is also used for localio. Store > >>>>>> auth_domain for localio in nfsd_uuid_t and transfer it to the client > >>>>>> if it is local to the server. > >>>>>> > >>>>>> Relative to containers, localio gives the client access to the network > >>>>>> namespace the server has. This is required to allow the client to > >>>>>> access the server's per-namespace nfsd_net struct. > >>>>>> > >>>>>> This commit also introduces the use of NFSD's percpu_ref to interlock > >>>>>> nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is > >>>>>> not destroyed while in use by nfsd_open_local_fh and other LOCALIO > >>>>>> client code. > >>>>>> > >>>>>> CONFIG_NFS_LOCALIO enables NFS server support for LOCALIO. > >>>>>> > >>>>>> Signed-off-by: Weston Andros Adamson <dros@xxxxxxxxxxxxxxx> > >>>>>> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > >>>>>> Co-developed-by: Mike Snitzer <snitzer@xxxxxxxxxx> > >>>>>> Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx> > >>>>>> Co-developed-by: NeilBrown <neilb@xxxxxxx> > >>>>>> Signed-off-by: NeilBrown <neilb@xxxxxxx> > >>>>>> > >>>>>> Not-Acked-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > >>>>>> Not-Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> > >>>>>> --- > >>>>>> fs/nfsd/Makefile | 1 + > >>>>>> fs/nfsd/filecache.c | 2 +- > >>>>>> fs/nfsd/localio.c | 112 +++++++++++++++++++++++++++++++++++++ > >>>>>> fs/nfsd/netns.h | 4 ++ > >>>>>> fs/nfsd/nfsctl.c | 25 ++++++++- > >>>>>> fs/nfsd/trace.h | 3 +- > >>>>>> fs/nfsd/vfs.h | 2 + > >>>>>> include/linux/nfslocalio.h | 8 +++ > >>>>>> 8 files changed, 154 insertions(+), 3 deletions(-) > >>>>>> create mode 100644 fs/nfsd/localio.c > >>>>>> > >>>>>> diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile > >>>>>> index b8736a82e57c..18cbd3fa7691 100644 > >>>>>> --- a/fs/nfsd/Makefile > >>>>>> +++ b/fs/nfsd/Makefile > >>>>>> @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o > >>>>>> nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o > >>>>>> nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o > >>>>>> nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o > >>>>>> +nfsd-$(CONFIG_NFS_LOCALIO) += localio.o > >>>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > >>>>>> index 89ff380ec31e..348c1b97092e 100644 > >>>>>> --- a/fs/nfsd/filecache.c > >>>>>> +++ b/fs/nfsd/filecache.c > >>>>>> @@ -52,7 +52,7 @@ > >>>>>> #define NFSD_FILE_CACHE_UP (0) > >>>>>> > >>>>>> /* We only care about NFSD_MAY_READ/WRITE for this cache */ > >>>>>> -#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE) > >>>>>> +#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO) > >>>>>> > >>>>>> static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); > >>>>>> static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); > >>>>>> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c > >>>>>> new file mode 100644 > >>>>>> index 000000000000..75df709c6903 > >>>>>> --- /dev/null > >>>>>> +++ b/fs/nfsd/localio.c > >>>>>> @@ -0,0 +1,112 @@ > >>>>>> +// SPDX-License-Identifier: GPL-2.0-only > >>>>>> +/* > >>>>>> + * NFS server support for local clients to bypass network stack > >>>>>> + * > >>>>>> + * Copyright (C) 2014 Weston Andros Adamson <dros@xxxxxxxxxxxxxxx> > >>>>>> + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > >>>>>> + * Copyright (C) 2024 Mike Snitzer <snitzer@xxxxxxxxxxxxxxx> > >>>>>> + * Copyright (C) 2024 NeilBrown <neilb@xxxxxxx> > >>>>>> + */ > >>>>>> + > >>>>>> +#include <linux/exportfs.h> > >>>>>> +#include <linux/sunrpc/svcauth.h> > >>>>>> +#include <linux/sunrpc/clnt.h> > >>>>>> +#include <linux/nfs.h> > >>>>>> +#include <linux/nfs_common.h> > >>>>>> +#include <linux/nfslocalio.h> > >>>>>> +#include <linux/string.h> > >>>>>> + > >>>>>> +#include "nfsd.h" > >>>>>> +#include "vfs.h" > >>>>>> +#include "netns.h" > >>>>>> +#include "filecache.h" > >>>>>> + > >>>>>> +static const struct nfsd_localio_operations nfsd_localio_ops = { > >>>>>> + .nfsd_open_local_fh = nfsd_open_local_fh, > >>>>>> + .nfsd_file_put_local = nfsd_file_put_local, > >>>>>> + .nfsd_file_file = nfsd_file_file, > >>>>>> +}; > >>>>>> + > >>>>>> +void nfsd_localio_ops_init(void) > >>>>>> +{ > >>>>>> + memcpy(&nfs_to, &nfsd_localio_ops, sizeof(nfsd_localio_ops)); > >>>>>> +} > >>>>> > >>>>> Same comment as Neil: this should surface a pointer to the > >>>>> localio_ops struct. Copying the whole set of function pointers is > >>>>> generally unnecessary. > >>>>> > >>>>> > >>>>>> + > >>>>>> +/** > >>>>>> + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file > >>>>>> + * > >>>>>> + * @uuid: nfs_uuid_t which provides the 'struct net' to get the proper nfsd_net > >>>>>> + * and the 'struct auth_domain' required for LOCALIO access > >>>>>> + * @rpc_clnt: rpc_clnt that the client established, used for sockaddr and cred > >>>>>> + * @cred: cred that the client established > >>>>>> + * @nfs_fh: filehandle to lookup > >>>>>> + * @fmode: fmode_t to use for open > >>>>>> + * > >>>>>> + * This function maps a local fh to a path on a local filesystem. > >>>>>> + * This is useful when the nfs client has the local server mounted - it can > >>>>>> + * avoid all the NFS overhead with reads, writes and commits. > >>>>>> + * > >>>>>> + * On successful return, returned nfsd_file will have its nf_net member > >>>>>> + * set. Caller (NFS client) is responsible for calling nfsd_serv_put and > >>>>>> + * nfsd_file_put (via nfs_to.nfsd_file_put_local). > >>>>>> + */ > >>>>>> +struct nfsd_file * > >>>>>> +nfsd_open_local_fh(nfs_uuid_t *uuid, > >>>>>> + struct rpc_clnt *rpc_clnt, const struct cred *cred, > >>>>>> + const struct nfs_fh *nfs_fh, const fmode_t fmode) > >>>>>> + __must_hold(rcu) > >>>>>> +{ > >>>>>> + int mayflags = NFSD_MAY_LOCALIO; > >>>>>> + struct nfsd_net *nn = NULL; > >>>>>> + struct net *net; > >>>>>> + struct svc_cred rq_cred; > >>>>>> + struct svc_fh fh; > >>>>>> + struct nfsd_file *localio; > >>>>>> + __be32 beres; > >>>>>> + > >>>>>> + if (nfs_fh->size > NFS4_FHSIZE) > >>>>>> + return ERR_PTR(-EINVAL); > >>>>>> + > >>>>>> + /* > >>>>>> + * Not running in nfsd context, so must safely get reference on nfsd_serv. > >>>>>> + * But the server may already be shutting down, if so disallow new localio. > >>>>>> + * uuid->net is NOT a counted reference, but caller's rcu_read_lock() ensures > >>>>>> + * that if uuid->net is not NULL, then calling nfsd_serv_try_get() is safe > >>>>>> + * and if it succeeds we will have an implied reference to the net. > >>>>>> + */ > >>>>>> + net = rcu_dereference(uuid->net); > >>>>>> + if (net) > >>>>>> + nn = net_generic(net, nfsd_net_id); > >>>>>> + if (unlikely(!nn || !nfsd_serv_try_get(nn))) > >>>>>> + return ERR_PTR(-ENXIO); > >>>>>> + > >>>>>> + /* Drop the rcu lock for nfsd_file_acquire_local() */ > >>>>>> + rcu_read_unlock(); > >>>>> > >>>>> I'm struggling with the locking logistics. Caller takes the RCU read > >>>>> lock, this function drops the lock, then takes it again. So: > >>>>> > >>>>> - A caller might rely on the lock being held continuously, but > >>>>> - The API contract documented above doesn't indicate that this > >>>>> function drops that lock > >>>>> - The __must_hold(rcu) annotation doesn't indicate that this > >>>>> function drops that lock, IIUC > >>>>> > >>>>> Dropping and retaking the lock in here is an anti-pattern that > >>>>> should be avoided. I suggest we are better off in the long run if > >>>>> the caller does not need to take the RCU read lock, but instead, > >>>>> nfsd_open_local_fh takes it right here just for the rcu_dereference. > >>> > >>> I thought so too when I first saw how Neil approached fixing this to > >>> be safe. It was only after putting further time to it (and having the > >>> benefit of being so close to all this) that I realized the nuance at > >>> play (please see my reply to Jeff below for the nuance I'm speaking > >>> of). > >>> > >>>>> > >>>>> OTOH, Why drop the lock before calling nfsd_file_acquire_local()? > >>>>> The RCU read lock can safely be taken more than once in succession. > >>>>> > >>>>> Let's rethink the locking strategy. > >>>>> > >>> > >>> Yes, _that_ is a very valid point. I did wonder the same: it seems > >>> perfectly fine to simply retain the RCU throughout the entirety of > >>> nfsd_open_local_fh(). > >>> > >> > >> Nope. nfsd_file_do_acquire can allocate, so you can't hold the > >> rcu_read_lock over the whole thing. > > > > Ah, yeap.. sorry, I knew that ;) > > > >>>> Agreed. The only caller does this: > >>>> > >>>> rcu_read_lock(); > >>>> if (!rcu_access_pointer(uuid->net)) { > >>>> rcu_read_unlock(); > >>>> return ERR_PTR(-ENXIO); > >>>> } > >>>> localio = nfs_to.nfsd_open_local_fh(uuid, rpc_clnt, cred, > >>>> nfs_fh, fmode); > >>>> rcu_read_unlock(); > >>>> > >>>> Maybe just move the check for uuid->net down into nfsd_open_local_fh, > >>>> and it can acquire the rcu_read_lock for itself? > >>> > >>> No, sorry we cannot. The call to nfs_to.nfsd_open_local_fh (which is > >>> a symbol provided by nfsd) is only safe if the RCU protected pre-check > >>> shows the uuid->net valid. > >> > >> Ouch, ok. > > > > I had to double check but I did add a comment that speaks directly to > > this "nuance" above the code you quoted: > > > > /* > > * uuid->net must not be NULL, otherwise NFS may not have ref > > * on NFSD and therefore cannot safely make 'nfs_to' calls. > > */ > > > > So yeah, this code needs to stay like this. The __must_hold(rcu) just > > ensures the RCU is held on entry and exit.. the bouncing of RCU > > (dropping and retaking) isn't of immediate concern is it? While I > > agree it isn't ideal, it is what it is given: > > 1) NFS caller of NFSD symbol is only safe if it has RCU amd verified > > uuid->net valid > > 2) nfsd_file_do_acquire() can allocate. > > OK, understood, but the annotation is still wrong. The lock > is dropped here so I think you need __releases and __acquires > in that case. However... > > Let's wait for Neil's comments, but I think this needs to be > properly addressed before merging. The comments are not going > to be enough IMO. I don't have much to add. Mike's description of the locking requirement (nfs.to.foo need to be safe) and Jeff's confirmation that we cannot hold rcu across getting the nfsd_file are exactly what I would have said. I agree that dropping and reclaiming a lock is an anti-pattern and in best avoided in general. I cannot see a better alternative in this case. According to Documentation/dev-tools/sparse.txt: __must_hold - The specified lock is held on function entry and exit. __acquires - The specified lock is held on function exit, but not entry. __releases - The specified lock is held on function entry, but not exit. only __must_hold applies. But maybe sparse.txt is wrong. static struct bpf_local_storage_elem * bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info, struct bpf_local_storage_elem *prev_selem) __acquires(RCU) __releases(RCU) gives a counter example. Maybe we should copy that as you say. Maybe we should fix sparse.txt too. NeilBrown