Re: [PATCH v9 13/19] nfsd: add "localio" support

"NeilBrown" <neilb@xxxxxxx> · Mon, 01 Jul 2024 07:56:26 +1000

On Mon, 01 Jul 2024, Jeff Layton wrote:
> On Sun, 2024-06-30 at 16:15 -0400, Chuck Lever wrote:
> > On Sun, Jun 30, 2024 at 03:59:58PM -0400, Jeff Layton wrote:
> > > On Sun, 2024-06-30 at 15:55 -0400, Chuck Lever wrote:
> > > > On Sun, Jun 30, 2024 at 03:52:51PM -0400, Jeff Layton wrote:
> > > > > On Sun, 2024-06-30 at 15:44 -0400, Mike Snitzer wrote:
> > > > > > On Sun, Jun 30, 2024 at 10:49:51AM -0400, Chuck Lever wrote:
> > > > > > > On Sat, Jun 29, 2024 at 06:18:42PM -0400, Chuck Lever wrote:
> > > > > > > > > +
> > > > > > > > > +	/* nfs_fh -> svc_fh */
> > > > > > > > > +	if (nfs_fh->size > NFS4_FHSIZE) {
> > > > > > > > > +		status = -EINVAL;
> > > > > > > > > +		goto out;
> > > > > > > > > +	}
> > > > > > > > > +	fh_init(&fh, NFS4_FHSIZE);
> > > > > > > > > +	fh.fh_handle.fh_size = nfs_fh->size;
> > > > > > > > > +	memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > > > > > > > > +
> > > > > > > > > +	if (fmode & FMODE_READ)
> > > > > > > > > +		mayflags |= NFSD_MAY_READ;
> > > > > > > > > +	if (fmode & FMODE_WRITE)
> > > > > > > > > +		mayflags |= NFSD_MAY_WRITE;
> > > > > > > > > +
> > > > > > > > > +	beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > > > > > +	if (beres) {
> > > > > > > > > +		status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > > > > > +		goto out_fh_put;
> > > > > > > > > +	}
> > > > > > > > 
> > > > > > > > So I'm wondering whether just calling fh_verify() and then
> > > > > > > > nfsd_open_verified() would be simpler and/or good enough here. Is
> > > > > > > > there a strong reason to use the file cache for locally opened
> > > > > > > > files? Jeff, any thoughts?
> > > > > > > 
> > > > > > > > Will there be writeback ramifications for
> > > > > > > > doing this? Maybe we need a comment here explaining how these files
> > > > > > > > are garbage collected (just an fput by the local I/O client, I would
> > > > > > > > guess).
> > > > > > > 
> > > > > > > OK, going back to this: Since right here we immediately call 
> > > > > > > 
> > > > > > > 	nfsd_file_put(nf);
> > > > > > > 
> > > > > > > There are no writeback ramifications nor any need to comment about
> > > > > > > garbage collection. But this still seems like a lot of (possibly
> > > > > > > unnecessary) overhead for simply obtaining a struct file.
> > > > > > 
> > > > > > Easy enough change, probably best to avoid the filecache but would like
> > > > > > to verify with Jeff before switching:
> > > > > > 
> > > > > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > > > > index 1d6508aa931e..85ebf63789fb 100644
> > > > > > --- a/fs/nfsd/localio.c
> > > > > > +++ b/fs/nfsd/localio.c
> > > > > > @@ -197,7 +197,6 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > > > >         const struct cred *save_cred;
> > > > > >         struct svc_rqst *rqstp;
> > > > > >         struct svc_fh fh;
> > > > > > -       struct nfsd_file *nf;
> > > > > >         __be32 beres;
> > > > > > 
> > > > > >         if (nfs_fh->size > NFS4_FHSIZE)
> > > > > > @@ -235,13 +234,12 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > > > >         if (fmode & FMODE_WRITE)
> > > > > >                 mayflags |= NFSD_MAY_WRITE;
> > > > > > 
> > > > > > -       beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > > +       beres = fh_verify(rqstp, &fh, S_IFREG, mayflags);
> > > > > >         if (beres) {
> > > > > >                 status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > >                 goto out_fh_put;
> > > > > >         }
> > > > > > -       *pfilp = get_file(nf->nf_file);
> > > > > > -       nfsd_file_put(nf);
> > > > > > +       status = nfsd_open_verified(rqstp, &fh, mayflags, pfilp);
> > > > > >  out_fh_put:
> > > > > >         fh_put(&fh);
> > > > > >         nfsd_local_fakerqst_destroy(rqstp);
> > > > > > 
> > > > > 
> > > > > My suggestion would be to _not_ do this. I think you do want to use the
> > > > > filecache (mostly for performance reasons).
> > > > 
> > > > But look carefully:
> > > > 
> > > >  -- we're not calling nfsd_file_acquire_gc() here
> > > > 
> > > >  -- we're immediately calling nfsd_file_put() on the returned nf
> > > > 
> > > > There's nothing left in the file cache when nfsd_open_local_fh()
> > > > returns. Each call here will do a full file open and a full close.
> > > 
> > > Good point. This should be calling nfsd_file_acquire_gc(), IMO. 
> > 
> > So that goes to my point yesterday about writeback ramifications.
> > 
> > If these open files linger in the file cache, then when will they
> > get written back to storage and by whom? Is it going to be an nfsd
> > thread writing them back as part of garbage collection?
> > 
> 
> Usually the client is issuing regular COMMITs. If that doesn't happen,
> then the flusher threads should get the rest.
> 
> Side note: I don't guess COMMIT goes over the localio path yet, does
> it? Maybe it should. It would be nice to not tie up an nfsd thread with
> writeback.

The documentation certainly claims that COMMIT uses the localio path.  I
haven't double checked the code but I'd be very surprised if it didn't.

NeilBrown

> 
> > So, you're saying that the local I/O client will always behave like
> > NFSv3 in this regard, and open/read/close, open/write/close instead
> > of hanging on to the open file? That seems... suboptimal... and not
> > expected for a local file. That needs to be documented in the
> > LOCALIO design doc.
> > 
> 
> I imagine so, which is why I suggest using the filecache. If we get one
> READ or WRITE for the file via localio, we're pretty likely to get
> more. Why not amortize that file open over several operations?
>  
> > I'm also concerned about local applications closing a file but
> > having an open file handle linger in the file cache -- that can
> > prevent other accesses to the file until the GC ejects that open
> > file, as we've seen in the field.
> > 
> > IMHO nfsd_file_acquire_gc() is going to have some unwanted side
> > effects.
> > 
> 
> Typically, the client issues COMMIT calls when the client-side fd is
> closed (for CTO). While I think we do need to be able to deal with
> flushing files with dirty data that are left "hanging", that shouldn't
> be the common case. Most of the time, the client is going to be issuing
> regular COMMITs so that it can clean its pages.
> 
> IOW, I don't see how localio is any different than the case of normal
> v3 IO in this respect.
> -- 
> Jeff Layton <jlayton@xxxxxxxxxx>
>