Re: nfsd/filecache: add nfsd_file_acquire_gc_cached

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 25, 2024 at 02:00:56PM +1100, NeilBrown wrote:
> On Fri, 25 Oct 2024, Mike Snitzer wrote:
> > Rather than make nfsd_file_do_acquire() more complex (by training
> > it to conditionally skip both fh_verify() and nfsd_file allocation
> > and contruction) introduce nfsd_file_acquire_gc_cached() -- which
> > duplicates the minimalist subset of nfsd_file_do_acquire() needed to
> > achieve nfsd_file lookup using an opaque @inode_key.
> > 
> > nfsd_file_acquire_gc_cached() only returns a cached and GC'd nfsd_file
> > obtained using the opaque @inode_key, established from a previous call
> > to nfsd_file_do_acquire_local() that originally added the GC'd
> > nfsd_file to the filecache.
> > 
> > Update nfsd_open_local_fh to store @inode_key in @nfs_fh so later
> > calls can check if it maps to an open GC'd nfsd_file in the filecache
> > using nfsd_file_acquire_gc_cached().  Its nfsd_file_lookup_locked()
> > call will only find a match if @cred matches the nfsd_file's nf_cred.
> > 
> > And care is taken to clear the inode_key in nfsd_file_free() if the
> > nfsd_file has a non-NULL nf_inodep (which is a pointer to the address
> > of the opaque inode_key stored in the nfs_fh).  This avoids any risk
> > of re-using the old inode_key for a different nfsd_file.
> > 
> > This commit's cached nfsd_file lookup dramatically speeds up LOCALIO
> > performance, especially for small 4K O_DIRECT IO, e.g.:
> > 
> > before: read: IOPS=376k,  BW=1469MiB/s (1541MB/s)(28.7GiB/20001msec)
> > after:  read: IOPS=1037k, BW=4050MiB/s (4247MB/s)(79.1GiB/20002msec)
> > 
> > Note that LOCALIO calls nfsd_open_local_fh() for every IO it issues to
> > the underlying filesystem using the returned nfsd_file.  This is why
> > caching the opaque 'inode_key' in 'struct nfs_fh' is so helpful for
> > LOCALIO to quickly return the cached nfsd_file.  Whereas regular NFS
> > avoids fh_verify()'s costly duplicate lookups of the underlying
> > filesystem's dentry by caching it in 'fh_dentry' of 'struct svc_fh'.
> > LOCALIO cannot take the same approach, of storing the dentry, without
> > creating object lifetime issues associated with dentry references
> > holding server mounts open and preventing unmounts.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
> 
> I think this is a good idea.  If we are to avoid a complete lookup for
> every IO we need some back-pointer from the nfsd filecache to something
> in nfs so that a cached lookup can be invalidated.  Various other
> schemes have been suggested before.  This one seems particularly simple.
> 
> I'm wondering about the request for a garbage-collected nfsd_file
> though.  For NFSv3 that makes sense.  For NFSv4 we would expect the file
> to already be open as a non-garbage-collected nfsd_file and opening it
> again seems wasteful.  That doesn't need to be fixed for this patch and
> maybe doesn't need to be fixed at all, but it seemed worth highlighting.

Maybe you're referring to the nfsd_file_acquire_gc_cached() kernel doc
text?  The code doesn't impose the nfsd_file must be a GC'd nfsd_file
(nor do I ever create an nfsd_file, if it isn't in the filecache then
it returns NULL).

GC'd just buys us a more likely chance of finding the nfsd_file in the
cache so it pairs nicely with GC having been requested/used when the
nfsd_file originally created and added to the cache.

Anyway, what follows in my reply is all moot given this thread has
teased out the desire to utilize a callback mechanism to allow the NFS
client to hold a longterm reference on the open nfsd_file.

BUT I will be folding in all the things covered below so I can at
least put nfsd_file_acquire_cached() out to pasture in more fully
formed state (should there ever be a need to revisit it)...

> More below
> 
> > ---
> >  fs/nfs/inode.c             |  3 ++
> >  fs/nfs_common/nfslocalio.c |  2 +-
> >  fs/nfsd/filecache.c        | 78 ++++++++++++++++++++++++++++++++++++++
> >  fs/nfsd/filecache.h        |  7 ++++
> >  fs/nfsd/localio.c          | 46 +++++++++++++++++++---
> >  include/linux/nfs.h        |  4 ++
> >  include/linux/nfslocalio.h |  6 +--
> >  7 files changed, 136 insertions(+), 10 deletions(-)
> > 
> > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> > index cc7a32b32676..3051d65e3a89 100644
> > --- a/fs/nfs/inode.c
> > +++ b/fs/nfs/inode.c
> > @@ -2413,6 +2413,9 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
> >  #endif /* CONFIG_NFS_V4 */
> >  #ifdef CONFIG_NFS_V4_2
> >  	nfsi->xattr_cache = NULL;
> > +#endif
> > +#if IS_ENABLED(CONFIG_NFS_LOCALIO)
> > +	nfsi->fh.inode_key = NULL;
> >  #endif
> >  	nfs_netfs_inode_init(nfsi);
> >  
> > diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> > index 09404d142d1a..bacebaa1e15c 100644
> > --- a/fs/nfs_common/nfslocalio.c
> > +++ b/fs/nfs_common/nfslocalio.c
> > @@ -130,7 +130,7 @@ EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_one_client);
> >  
> >  struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
> >  		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
> > -		   const struct nfs_fh *nfs_fh, const fmode_t fmode)
> > +		   struct nfs_fh *nfs_fh, const fmode_t fmode)
> >  {
> >  	struct net *net;
> >  	struct nfsd_file *localio;
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 1408166222c5..5ab978ac3555 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -221,6 +221,9 @@ nfsd_file_alloc(struct net *net, struct inode *inode, unsigned char need,
> >  	INIT_LIST_HEAD(&nf->nf_gc);
> >  	nf->nf_birthtime = ktime_get();
> >  	nf->nf_file = NULL;
> > +#if IS_ENABLED(CONFIG_NFS_LOCALIO)
> > +	nf->nf_inodep = NULL;
> > +#endif
> 
> All these "#if IS_ENABLED" are ugly.  I wonder if we could get rid of
> them.
> Using __GFP_ZERO for the alloc would work here, but might be an unwanted
> cost.  Maybe nf_inodep could be ignored if nf_file is NULL.

I did originally nest the nf_inodep backpointer reset within
nfsd_file_free()'s nf->nf_file check, will go back to that.

Then nf_inodep is otherwise ignored everywhere.

> >  	nf->nf_cred = get_current_cred();
> >  	nf->nf_net = net;
> >  	nf->nf_flags = want_gc ?
> > @@ -285,6 +288,12 @@ nfsd_file_free(struct nfsd_file *nf)
> >  		nfsd_file_check_write_error(nf);
> >  		nfsd_filp_close(nf->nf_file);
> >  	}
> > +#if IS_ENABLED(CONFIG_NFS_LOCALIO)
> > +	if (nf->nf_inodep) {
> > +		*(nf->nf_inodep) = NULL;
> > +		nf->nf_inodep = NULL;
> > +	}
> > +#endif
> 
> This one is harder to hide.  We don't really need the final assignment
> though so maybe we could
>
> #define NF_INODEP(nf) (nf->nf_inodep)
> or
> #define NF_INODEP(nf) (NULL)
> 
> in a header (where #if are more acceptable), then make this code:
> 
>  if (NF_INODEP(nf))
> 	*NF_INODEP(nf) = NULL;
> 
> Is that better or worse I wonder.

Not opposed to this, will do.

> >  
> >  	/*
> >  	 * If this item is still linked via nf_lru, that's a bug.
> > @@ -1255,6 +1264,75 @@ nfsd_file_acquire_local(struct net *net, struct svc_cred *cred,
> >  	return beres;
> >  }
> >  
> > +/**
> > + * nfsd_file_acquire_cached - Get cached GC'd open file using inode
> > + * @net: The network namespace in which to perform a lookup
> > + * @cred: the user credential with which to validate access
> > + * @inode_key: inode to use as opaque lookup key
> > + * @may_flags: NFSD_MAY_ settings for the file
> > + * @pnf: OUT: found cached GC'd "struct nfsd_file" object
> > + *
> > + * Rather than make nfsd_file_do_acquire more complex (by training
> > + * it to conditionally skip fh_verify(), nfsd_file allocation and
> > + * contruction) duplicate the minimalist subset of it that is
> > + * needed to achieve nfsd_file lookup using the opaque @inode_key.
> > + *
> > + * The nfsd_file object returned by this API is reference-counted
> > + * and garbage-collected. The object is retained for a few
> > + * seconds after the final nfsd_file_put() in case the caller
> > + * wants to re-use it.
> > + *
> > + * Return values:
> > + *   %nfs_ok - @pnf points to an nfsd_file with its reference
> > + *   count boosted.
> > + *
> > + * On error, an nfsstat value in network byte order is returned.
> > + */
> > +__be32
> > +nfsd_file_acquire_cached(struct net *net, const struct cred *cred,
> > +			 void *inode_key, unsigned int may_flags,
> > +			 struct nfsd_file **pnf)
> > +{
> > +	unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
> > +	struct nfsd_file *nf;
> > +	__be32 status;
> > +
> > +	rcu_read_lock();
> > +	nf = nfsd_file_lookup_locked(net, cred, inode_key, need, true);
> > +	rcu_read_unlock();
> > +
> > +	if (unlikely(!nf))
> > +		return nfserr_noent;
> > +
> > +	/*
> > +	 * If the nf is on the LRU then it holds an extra reference
> > +	 * that must be put if it's removed. It had better not be
> > +	 * the last one however, since we should hold another.
> > +	 */
> > +	if (nfsd_file_lru_remove(nf))
> > +		WARN_ON_ONCE(refcount_dec_and_test(&nf->nf_ref));
> 
> Just use refcount_dec(&nf->nf_ref).  It will provide a warning if the
> count reaches zero.  In general you should never need warnings when
> using refcount as it generates the needed warnings itself.

OK, I lifted the code from nfsd_file_do_acquire() as a starting point;
so it should probably be cleaned up there (independent of this patch,
not it).

> > +
> > +	if (WARN_ON_ONCE(test_bit(NFSD_FILE_PENDING, &nf->nf_flags) ||
> > +			 !test_bit(NFSD_FILE_HASHED, &nf->nf_flags))) {
> > +		status = nfserr_inval;
> > +		goto error;
> > +	}
> 
> Do we really want the above?  I guess you were following the pattern in
> nfsd_file_do_acquire() which waits for FILE_PENDING and then re-tests
> FILE_HASHED (nfsd_file_lookup_locked() already tested it).
> I guess it doesn't hurt, but I'm not sure it helps.

Right, was just trying to maintain status-quo.  I think it doesn't
hurt, and may actually help given spin_lock isn't held around
nfsd_file_lookup_locked?  But the WARN_ON_ONCE should be removed.

> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > index f441cb9f74d5..34a229409117 100644
> > --- a/fs/nfsd/localio.c
> > +++ b/fs/nfsd/localio.c
> > @@ -58,33 +58,67 @@ void nfsd_localio_ops_init(void)
> >  struct nfsd_file *
> >  nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
> >  		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
> > -		   const struct nfs_fh *nfs_fh, const fmode_t fmode)
> > +		   struct nfs_fh *nfs_fh, const fmode_t fmode)
> >  {
> >  	int mayflags = NFSD_MAY_LOCALIO;
> >  	struct svc_cred rq_cred;
> >  	struct svc_fh fh;
> >  	struct nfsd_file *localio;
> > +	void *nf_inode_key;
> >  	__be32 beres;
> >  
> >  	if (nfs_fh->size > NFS4_FHSIZE)
> >  		return ERR_PTR(-EINVAL);
> >  
> > -	/* nfs_fh -> svc_fh */
> > -	fh_init(&fh, NFS4_FHSIZE);
> > -	fh.fh_handle.fh_size = nfs_fh->size;
> > -	memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > -
> >  	if (fmode & FMODE_READ)
> >  		mayflags |= NFSD_MAY_READ;
> >  	if (fmode & FMODE_WRITE)
> >  		mayflags |= NFSD_MAY_WRITE;
> >  
> > +	/*
> > +	 * Check if 'inode_key' stored in @nfs_fh maps to an
> > +	 * open nfsd_file in the filecache (from a previous
> > +	 * nfsd_open_local_fh).
> > +	 *
> > +	 * The 'inode_key' may have become stale (due to nfsd_file
> > +	 * being free'd by filecache GC) so the lookup will fail
> > +	 * gracefully by falling back to using nfsd_file_acquire_local().
> > +	 */
> > +	nf_inode_key = READ_ONCE(nfs_fh->inode_key);
> 
> I think you want the above in an rcu-locked region with
> nfsd_file_acquire_cached().  Else the inode could be freed and
> reallocated after the READ_ONCE and before the lookup.
> Maybe pass the address of inode_key and have nfsd_file_acquire_cached()
> deref after getting rcu_read_lock().

Good point, will do.

> > +	if (nf_inode_key) {
> > +		beres = nfsd_file_acquire_cached(net, cred,
> > +						 nf_inode_key,
> > +						 mayflags, &localio);
> > +		if (beres == nfs_ok)
> > +			return localio;
> > +		/*
> > +		 * reset stale nfs_fh->inode_key and fallthru
> > +		 * to use normal nfsd_file_acquire_local().
> > +		 */
> > +		nfs_fh->inode_key = NULL;
> > +	}

> > diff --git a/include/linux/nfs.h b/include/linux/nfs.h
> > index 9ad727ddfedb..56c894575c70 100644
> > --- a/include/linux/nfs.h
> > +++ b/include/linux/nfs.h
> > @@ -29,6 +29,10 @@
> >  struct nfs_fh {
> >  	unsigned short		size;
> >  	unsigned char		data[NFS_MAXFHSIZE];
> > +#if IS_ENABLED(CONFIG_NFS_LOCALIO)
> > +	/* 'inode_key' is an opaque key used for nfsd_file hash lookups */
> > +	void *			inode_key;
> > +#endif
> 
> I wonder if this is really the right place to store the inode_key.
> 'struct nfs_fh' appears in lots of places and the only place where you
> want the inode_key is inside the nfs_inode.  Maybe store an augmented
> nfs_fh in there...

Yeah, I went with nfs_fh because it served my needs (and nfs_fh is
passed to nfs_open_local_fh).  But very much open to suggestions.
While I'm not yet sure about your nfs_inode suggestion, I'll certainly
take a closer look after addressing your other feedback.

Thanks for your review!

Mike




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux