> On Sep 3, 2024, at 11:29 AM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > > On Tue, Sep 03, 2024 at 11:19:45AM -0400, Jeff Layton wrote: >> On Tue, 2024-09-03 at 11:00 -0400, Mike Snitzer wrote: >>> On Tue, Sep 03, 2024 at 10:40:28AM -0400, Jeff Layton wrote: >>>> On Tue, 2024-09-03 at 10:34 -0400, Chuck Lever wrote: >>>>> On Sat, Aug 31, 2024 at 06:37:36PM -0400, Mike Snitzer wrote: >>>>>> From: Weston Andros Adamson <dros@xxxxxxxxxxxxxxx> >>>>>> >>>>>> Add server support for bypassing NFS for localhost reads, writes, and >>>>>> commits. This is only useful when both the client and server are >>>>>> running on the same host. >>>>>> >>>>>> If nfsd_open_local_fh() fails then the NFS client will both retry and >>>>>> fallback to normal network-based read, write and commit operations if >>>>>> localio is no longer supported. >>>>>> >>>>>> Care is taken to ensure the same NFS security mechanisms are used >>>>>> (authentication, etc) regardless of whether localio or regular NFS >>>>>> access is used. The auth_domain established as part of the traditional >>>>>> NFS client access to the NFS server is also used for localio. Store >>>>>> auth_domain for localio in nfsd_uuid_t and transfer it to the client >>>>>> if it is local to the server. >>>>>> >>>>>> Relative to containers, localio gives the client access to the network >>>>>> namespace the server has. This is required to allow the client to >>>>>> access the server's per-namespace nfsd_net struct. >>>>>> >>>>>> This commit also introduces the use of NFSD's percpu_ref to interlock >>>>>> nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is >>>>>> not destroyed while in use by nfsd_open_local_fh and other LOCALIO >>>>>> client code. >>>>>> >>>>>> CONFIG_NFS_LOCALIO enables NFS server support for LOCALIO. >>>>>> >>>>>> Signed-off-by: Weston Andros Adamson <dros@xxxxxxxxxxxxxxx> >>>>>> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >>>>>> Co-developed-by: Mike Snitzer <snitzer@xxxxxxxxxx> >>>>>> Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx> >>>>>> Co-developed-by: NeilBrown <neilb@xxxxxxx> >>>>>> Signed-off-by: NeilBrown <neilb@xxxxxxx> >>>>>> >>>>>> Not-Acked-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>>>> Not-Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> >>>>>> --- >>>>>> fs/nfsd/Makefile | 1 + >>>>>> fs/nfsd/filecache.c | 2 +- >>>>>> fs/nfsd/localio.c | 112 +++++++++++++++++++++++++++++++++++++ >>>>>> fs/nfsd/netns.h | 4 ++ >>>>>> fs/nfsd/nfsctl.c | 25 ++++++++- >>>>>> fs/nfsd/trace.h | 3 +- >>>>>> fs/nfsd/vfs.h | 2 + >>>>>> include/linux/nfslocalio.h | 8 +++ >>>>>> 8 files changed, 154 insertions(+), 3 deletions(-) >>>>>> create mode 100644 fs/nfsd/localio.c >>>>>> >>>>>> diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile >>>>>> index b8736a82e57c..18cbd3fa7691 100644 >>>>>> --- a/fs/nfsd/Makefile >>>>>> +++ b/fs/nfsd/Makefile >>>>>> @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o >>>>>> nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o >>>>>> nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o >>>>>> nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o >>>>>> +nfsd-$(CONFIG_NFS_LOCALIO) += localio.o >>>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c >>>>>> index 89ff380ec31e..348c1b97092e 100644 >>>>>> --- a/fs/nfsd/filecache.c >>>>>> +++ b/fs/nfsd/filecache.c >>>>>> @@ -52,7 +52,7 @@ >>>>>> #define NFSD_FILE_CACHE_UP (0) >>>>>> >>>>>> /* We only care about NFSD_MAY_READ/WRITE for this cache */ >>>>>> -#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE) >>>>>> +#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO) >>>>>> >>>>>> static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); >>>>>> static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); >>>>>> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c >>>>>> new file mode 100644 >>>>>> index 000000000000..75df709c6903 >>>>>> --- /dev/null >>>>>> +++ b/fs/nfsd/localio.c >>>>>> @@ -0,0 +1,112 @@ >>>>>> +// SPDX-License-Identifier: GPL-2.0-only >>>>>> +/* >>>>>> + * NFS server support for local clients to bypass network stack >>>>>> + * >>>>>> + * Copyright (C) 2014 Weston Andros Adamson <dros@xxxxxxxxxxxxxxx> >>>>>> + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >>>>>> + * Copyright (C) 2024 Mike Snitzer <snitzer@xxxxxxxxxxxxxxx> >>>>>> + * Copyright (C) 2024 NeilBrown <neilb@xxxxxxx> >>>>>> + */ >>>>>> + >>>>>> +#include <linux/exportfs.h> >>>>>> +#include <linux/sunrpc/svcauth.h> >>>>>> +#include <linux/sunrpc/clnt.h> >>>>>> +#include <linux/nfs.h> >>>>>> +#include <linux/nfs_common.h> >>>>>> +#include <linux/nfslocalio.h> >>>>>> +#include <linux/string.h> >>>>>> + >>>>>> +#include "nfsd.h" >>>>>> +#include "vfs.h" >>>>>> +#include "netns.h" >>>>>> +#include "filecache.h" >>>>>> + >>>>>> +static const struct nfsd_localio_operations nfsd_localio_ops = { >>>>>> + .nfsd_open_local_fh = nfsd_open_local_fh, >>>>>> + .nfsd_file_put_local = nfsd_file_put_local, >>>>>> + .nfsd_file_file = nfsd_file_file, >>>>>> +}; >>>>>> + >>>>>> +void nfsd_localio_ops_init(void) >>>>>> +{ >>>>>> + memcpy(&nfs_to, &nfsd_localio_ops, sizeof(nfsd_localio_ops)); >>>>>> +} >>>>> >>>>> Same comment as Neil: this should surface a pointer to the >>>>> localio_ops struct. Copying the whole set of function pointers is >>>>> generally unnecessary. >>>>> >>>>> >>>>>> + >>>>>> +/** >>>>>> + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file >>>>>> + * >>>>>> + * @uuid: nfs_uuid_t which provides the 'struct net' to get the proper nfsd_net >>>>>> + * and the 'struct auth_domain' required for LOCALIO access >>>>>> + * @rpc_clnt: rpc_clnt that the client established, used for sockaddr and cred >>>>>> + * @cred: cred that the client established >>>>>> + * @nfs_fh: filehandle to lookup >>>>>> + * @fmode: fmode_t to use for open >>>>>> + * >>>>>> + * This function maps a local fh to a path on a local filesystem. >>>>>> + * This is useful when the nfs client has the local server mounted - it can >>>>>> + * avoid all the NFS overhead with reads, writes and commits. >>>>>> + * >>>>>> + * On successful return, returned nfsd_file will have its nf_net member >>>>>> + * set. Caller (NFS client) is responsible for calling nfsd_serv_put and >>>>>> + * nfsd_file_put (via nfs_to.nfsd_file_put_local). >>>>>> + */ >>>>>> +struct nfsd_file * >>>>>> +nfsd_open_local_fh(nfs_uuid_t *uuid, >>>>>> + struct rpc_clnt *rpc_clnt, const struct cred *cred, >>>>>> + const struct nfs_fh *nfs_fh, const fmode_t fmode) >>>>>> + __must_hold(rcu) >>>>>> +{ >>>>>> + int mayflags = NFSD_MAY_LOCALIO; >>>>>> + struct nfsd_net *nn = NULL; >>>>>> + struct net *net; >>>>>> + struct svc_cred rq_cred; >>>>>> + struct svc_fh fh; >>>>>> + struct nfsd_file *localio; >>>>>> + __be32 beres; >>>>>> + >>>>>> + if (nfs_fh->size > NFS4_FHSIZE) >>>>>> + return ERR_PTR(-EINVAL); >>>>>> + >>>>>> + /* >>>>>> + * Not running in nfsd context, so must safely get reference on nfsd_serv. >>>>>> + * But the server may already be shutting down, if so disallow new localio. >>>>>> + * uuid->net is NOT a counted reference, but caller's rcu_read_lock() ensures >>>>>> + * that if uuid->net is not NULL, then calling nfsd_serv_try_get() is safe >>>>>> + * and if it succeeds we will have an implied reference to the net. >>>>>> + */ >>>>>> + net = rcu_dereference(uuid->net); >>>>>> + if (net) >>>>>> + nn = net_generic(net, nfsd_net_id); >>>>>> + if (unlikely(!nn || !nfsd_serv_try_get(nn))) >>>>>> + return ERR_PTR(-ENXIO); >>>>>> + >>>>>> + /* Drop the rcu lock for nfsd_file_acquire_local() */ >>>>>> + rcu_read_unlock(); >>>>> >>>>> I'm struggling with the locking logistics. Caller takes the RCU read >>>>> lock, this function drops the lock, then takes it again. So: >>>>> >>>>> - A caller might rely on the lock being held continuously, but >>>>> - The API contract documented above doesn't indicate that this >>>>> function drops that lock >>>>> - The __must_hold(rcu) annotation doesn't indicate that this >>>>> function drops that lock, IIUC >>>>> >>>>> Dropping and retaking the lock in here is an anti-pattern that >>>>> should be avoided. I suggest we are better off in the long run if >>>>> the caller does not need to take the RCU read lock, but instead, >>>>> nfsd_open_local_fh takes it right here just for the rcu_dereference. >>> >>> I thought so too when I first saw how Neil approached fixing this to >>> be safe. It was only after putting further time to it (and having the >>> benefit of being so close to all this) that I realized the nuance at >>> play (please see my reply to Jeff below for the nuance I'm speaking >>> of). >>> >>>>> >>>>> OTOH, Why drop the lock before calling nfsd_file_acquire_local()? >>>>> The RCU read lock can safely be taken more than once in succession. >>>>> >>>>> Let's rethink the locking strategy. >>>>> >>> >>> Yes, _that_ is a very valid point. I did wonder the same: it seems >>> perfectly fine to simply retain the RCU throughout the entirety of >>> nfsd_open_local_fh(). >>> >> >> Nope. nfsd_file_do_acquire can allocate, so you can't hold the >> rcu_read_lock over the whole thing. > > Ah, yeap.. sorry, I knew that ;) > >>>> Agreed. The only caller does this: >>>> >>>> rcu_read_lock(); >>>> if (!rcu_access_pointer(uuid->net)) { >>>> rcu_read_unlock(); >>>> return ERR_PTR(-ENXIO); >>>> } >>>> localio = nfs_to.nfsd_open_local_fh(uuid, rpc_clnt, cred, >>>> nfs_fh, fmode); >>>> rcu_read_unlock(); >>>> >>>> Maybe just move the check for uuid->net down into nfsd_open_local_fh, >>>> and it can acquire the rcu_read_lock for itself? >>> >>> No, sorry we cannot. The call to nfs_to.nfsd_open_local_fh (which is >>> a symbol provided by nfsd) is only safe if the RCU protected pre-check >>> shows the uuid->net valid. >> >> Ouch, ok. > > I had to double check but I did add a comment that speaks directly to > this "nuance" above the code you quoted: > > /* > * uuid->net must not be NULL, otherwise NFS may not have ref > * on NFSD and therefore cannot safely make 'nfs_to' calls. > */ > > So yeah, this code needs to stay like this. The __must_hold(rcu) just > ensures the RCU is held on entry and exit.. the bouncing of RCU > (dropping and retaking) isn't of immediate concern is it? While I > agree it isn't ideal, it is what it is given: > 1) NFS caller of NFSD symbol is only safe if it has RCU amd verified > uuid->net valid > 2) nfsd_file_do_acquire() can allocate. OK, understood, but the annotation is still wrong. The lock is dropped here so I think you need __releases and __acquires in that case. However... Let's wait for Neil's comments, but I think this needs to be properly addressed before merging. The comments are not going to be enough IMO. -- Chuck Lever