Re: [PATCH v9 13/19] nfsd: add "localio" support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 01, 2024 at 08:22:56AM +1000, NeilBrown wrote:
> On Sun, 30 Jun 2024, Chuck Lever wrote:
> > Sorry, I guess I expected to have more time to learn about these
> > patches before writing review comments. But if you want them to go
> > in soon, I had better look more closely at them now.
> > 
> > 
> > On Fri, Jun 28, 2024 at 05:10:59PM -0400, Mike Snitzer wrote:
> > > Pass the stored cl_nfssvc_net from the client to the server as
> > 
> > This is the only mention of cl_nfssvc_net I can find in this
> > patch. I'm not sure what it is. Patch description should maybe
> > provide some context.
> > 
> > 
> > > first argument to nfsd_open_local_fh() to ensure the proper network
> > > namespace is used for localio.
> > 
> > Can the patch description say something about the distinct mount 
> > namespaces -- if the local application is running in a different
> > container than the NFS server, are we using only the network
> > namespaces for authorizing the file access? And is that OK to do?
> > If yes, patch description should explain that NFS local I/O ignores
> > the boundaries of mount namespaces and why that is OK to do.
> > 
> > 
> > > Signed-off-by: Weston Andros Adamson <dros@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Peng Tao <tao.peng@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Lance Shelton <lance.shelton@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
> > > ---
> > >  fs/nfsd/Makefile    |   1 +
> > >  fs/nfsd/filecache.c |   2 +-
> > >  fs/nfsd/localio.c   | 239 ++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/nfsd/nfssvc.c    |   1 +
> > >  fs/nfsd/trace.h     |   3 +-
> > >  fs/nfsd/vfs.h       |   9 ++
> > >  6 files changed, 253 insertions(+), 2 deletions(-)
> > >  create mode 100644 fs/nfsd/localio.c
> > > 
> > > diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> > > index b8736a82e57c..78b421778a79 100644
> > > --- a/fs/nfsd/Makefile
> > > +++ b/fs/nfsd/Makefile
> > > @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
> > >  nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
> > >  nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
> > >  nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> > > +nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index ad9083ca144b..99631fa56662 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -52,7 +52,7 @@
> > >  #define NFSD_FILE_CACHE_UP		     (0)
> > >  
> > >  /* We only care about NFSD_MAY_READ/WRITE for this cache */
> > > -#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
> > > +#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
> > >  
> > >  static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
> > >  static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > new file mode 100644
> > > index 000000000000..759a5cb79652
> > > --- /dev/null
> > > +++ b/fs/nfsd/localio.c
> > > @@ -0,0 +1,239 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * NFS server support for local clients to bypass network stack
> > > + *
> > > + * Copyright (C) 2014 Weston Andros Adamson <dros@xxxxxxxxxxxxxxx>
> > > + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > + * Copyright (C) 2024 Mike Snitzer <snitzer@xxxxxxxxxxxxxxx>
> > > + */
> > > +
> > > +#include <linux/exportfs.h>
> > > +#include <linux/sunrpc/svcauth_gss.h>
> > > +#include <linux/sunrpc/clnt.h>
> > > +#include <linux/nfs.h>
> > > +#include <linux/string.h>
> > > +
> > > +#include "nfsd.h"
> > > +#include "vfs.h"
> > > +#include "netns.h"
> > > +#include "filecache.h"
> > > +
> > > +#define NFSDDBG_FACILITY		NFSDDBG_FH
> > 
> > With no more dprintk() call sites in this patch, you no longer need
> > this macro definition.
> > 
> > 
> > > +/*
> > > + * We need to translate between nfs status return values and
> > > + * the local errno values which may not be the same.
> > > + * - duplicated from fs/nfs/nfs2xdr.c to avoid needless bloat of
> > > + *   all compiled nfs objects if it were in include/linux/nfs.h
> > > + */
> > > +static const struct {
> > > +	int stat;
> > > +	int errno;
> > > +} nfs_common_errtbl[] = {
> > > +	{ NFS_OK,		0		},
> > > +	{ NFSERR_PERM,		-EPERM		},
> > > +	{ NFSERR_NOENT,		-ENOENT		},
> > > +	{ NFSERR_IO,		-EIO		},
> > > +	{ NFSERR_NXIO,		-ENXIO		},
> > > +/*	{ NFSERR_EAGAIN,	-EAGAIN		}, */
> > > +	{ NFSERR_ACCES,		-EACCES		},
> > > +	{ NFSERR_EXIST,		-EEXIST		},
> > > +	{ NFSERR_XDEV,		-EXDEV		},
> > > +	{ NFSERR_NODEV,		-ENODEV		},
> > > +	{ NFSERR_NOTDIR,	-ENOTDIR	},
> > > +	{ NFSERR_ISDIR,		-EISDIR		},
> > > +	{ NFSERR_INVAL,		-EINVAL		},
> > > +	{ NFSERR_FBIG,		-EFBIG		},
> > > +	{ NFSERR_NOSPC,		-ENOSPC		},
> > > +	{ NFSERR_ROFS,		-EROFS		},
> > > +	{ NFSERR_MLINK,		-EMLINK		},
> > > +	{ NFSERR_NAMETOOLONG,	-ENAMETOOLONG	},
> > > +	{ NFSERR_NOTEMPTY,	-ENOTEMPTY	},
> > > +	{ NFSERR_DQUOT,		-EDQUOT		},
> > > +	{ NFSERR_STALE,		-ESTALE		},
> > > +	{ NFSERR_REMOTE,	-EREMOTE	},
> > > +#ifdef EWFLUSH
> > > +	{ NFSERR_WFLUSH,	-EWFLUSH	},
> > > +#endif
> > > +	{ NFSERR_BADHANDLE,	-EBADHANDLE	},
> > > +	{ NFSERR_NOT_SYNC,	-ENOTSYNC	},
> > > +	{ NFSERR_BAD_COOKIE,	-EBADCOOKIE	},
> > > +	{ NFSERR_NOTSUPP,	-ENOTSUPP	},
> > > +	{ NFSERR_TOOSMALL,	-ETOOSMALL	},
> > > +	{ NFSERR_SERVERFAULT,	-EREMOTEIO	},
> > > +	{ NFSERR_BADTYPE,	-EBADTYPE	},
> > > +	{ NFSERR_JUKEBOX,	-EJUKEBOX	},
> > > +	{ -1,			-EIO		}
> > > +};
> > > +
> > > +/**
> > > + * nfs_stat_to_errno - convert an NFS status code to a local errno
> > > + * @status: NFS status code to convert
> > > + *
> > > + * Returns a local errno value, or -EIO if the NFS status code is
> > > + * not recognized.  nfsd_file_acquire() returns an nfsstat that
> > > + * needs to be translated to an errno before being returned to a
> > > + * local client application.
> > > + */
> > > +static int nfs_stat_to_errno(enum nfs_stat status)
> > > +{
> > > +	int i;
> > > +
> > > +	for (i = 0; nfs_common_errtbl[i].stat != -1; i++) {
> > > +		if (nfs_common_errtbl[i].stat == (int)status)
> > > +			return nfs_common_errtbl[i].errno;
> > > +	}
> > > +	return nfs_common_errtbl[i].errno;
> > > +}
> > > +
> > > +static void
> > > +nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
> > > +{
> > > +	if (rqstp->rq_client)
> > > +		auth_domain_put(rqstp->rq_client);
> > > +	if (rqstp->rq_cred.cr_group_info)
> > > +		put_group_info(rqstp->rq_cred.cr_group_info);
> > > +	/* rpcauth_map_to_svc_cred_local() clears cr_principal */
> > > +	WARN_ON_ONCE(rqstp->rq_cred.cr_principal != NULL);
> > > +	kfree(rqstp->rq_xprt);
> > > +	kfree(rqstp);
> > > +}
> > > +
> > > +static struct svc_rqst *
> > > +nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
> > > +			const struct cred *cred)
> > > +{
> > > +	struct svc_rqst *rqstp;
> > > +	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > +	int status;
> > > +
> > > +	/* FIXME: not running in nfsd context, must get reference on nfsd_serv */
> > > +	if (unlikely(!READ_ONCE(nn->nfsd_serv)))
> > > +		return ERR_PTR(-ENXIO);
> > > +
> > > +	rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
> > > +	if (!rqstp)
> > > +		return ERR_PTR(-ENOMEM);
> > > +
> > > +	rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
> > > +	if (!rqstp->rq_xprt) {
> > > +		status = -ENOMEM;
> > > +		goto out_err;
> > > +	}
> > 
> > struct svc_rqst is pretty big (like, bigger than a couple of pages).
> > What happens if this allocation fails?
> > 
> > And how often does it occur -- does that add significant overhead?
> > 
> > 
> > > +
> > > +	rqstp->rq_xprt->xpt_net = net;
> > > +	__set_bit(RQ_SECURE, &rqstp->rq_flags);
> > > +	rqstp->rq_proc = 1;
> > > +	rqstp->rq_vers = 3;
> > 
> > IMO these need to be symbolic constants, not integers. Or, at least
> > there needs to be some documenting comments that explain these are
> > fake and why that's OK to do. Or, are there better choices?
> > 
> > 
> > > +	rqstp->rq_prot = IPPROTO_TCP;
> > > +	rqstp->rq_server = nn->nfsd_serv;
> > > +
> > > +	/* Note: we're connecting to ourself, so source addr == peer addr */
> > > +	rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
> > > +			(struct sockaddr *)&rqstp->rq_addr,
> > > +			sizeof(rqstp->rq_addr));
> > > +
> > > +	rpcauth_map_to_svc_cred_local(rpc_clnt->cl_auth, cred, &rqstp->rq_cred);
> > > +
> > > +	/*
> > > +	 * set up enough for svcauth_unix_set_client to be able to wait
> > > +	 * for the cache downcall. Note that we do _not_ want to allow the
> > > +	 * request to be deferred for later revisit since this rqst and xprt
> > > +	 * are not set up to run inside of the normal svc_rqst engine.
> > > +	 */
> > > +	INIT_LIST_HEAD(&rqstp->rq_xprt->xpt_deferred);
> > > +	kref_init(&rqstp->rq_xprt->xpt_ref);
> > > +	spin_lock_init(&rqstp->rq_xprt->xpt_lock);
> > > +	rqstp->rq_chandle.thread_wait = 5 * HZ;
> > > +
> > > +	status = svcauth_unix_set_client(rqstp);
> > > +	switch (status) {
> > > +	case SVC_OK:
> > > +		break;
> > > +	case SVC_DENIED:
> > > +		status = -ENXIO;
> > > +		goto out_err;
> > > +	default:
> > > +		status = -ETIMEDOUT;
> > > +		goto out_err;
> > > +	}
> > 
> > Interesting. Why would svcauth_unix_set_client fail for a local I/O
> > request? Wouldn't it only be because the local application is trying
> > to open a file it doesn't have permission to?
> > 
> 
> I'm beginning to think this section of code is the of the sort where you
> need to be twice as clever when debugging as you where when writing.  It
> is trying to get the client to use interfaces written for server-side
> actions, and it isn't a good fit.
> 
> I think that instead we should modify fh_verify() so that it takes
> explicit net, rq_vers, rq_cred, rq_client as well as the rqstp, and
> the localio client passes in a NULL rqstp.

Nit: I'd rather provide a new fh_verify-like API -- changing the
synopsis of fh_verify() itself will result in a lot of code churn
for only a single call site.


> Getting the rq_client is an interesting challenge.
> The above code (if I'm reading it correctly) gets the server-side
> address of the IP connection, and passes that through to the sunrpc code
> as though it is the client address.  So as long as the server is
> exporting to itself, and as long as no address translation is happening
> on the path, this works.  It feels messy though - and fragile.
> 
> I would rather we had some rq_client (struct auth_domain) that was
> dedicated to localio.  The client should be able to access it based on
> the fact that it could rather the server UUID using the LOCALIO RPC
> protocol.
> 
> I'm not sure what exactly this would look like, but the 
> 'struct auth_domain *' should be something that can be accessed
> directly, not looked up in a cache.

I'd like to mitigate the possibility of having to wait for a
possible cache upcall, and reduce or remove the need for a phony
svc_rqst. It sounds like you are on that path.

Further, this needs to be clearly documented -- it's bypassing
(or perhaps augmenting) the export's usual IP address-based
authorization mechanism, so there are security considerations.


> I can try to knock up a patch to allow fh_verify (and nfsd_file_acquire)
> without an rqstp.  I won't try the auth_domain change until I hear what
> others think.

-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux