On Thu, 2022-09-08 at 15:36 +0000, Chuck Lever III wrote: > > > > On Sep 7, 2022, at 6:33 PM, Trond Myklebust > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Wed, 2022-09-07 at 20:51 +0000, Chuck Lever III wrote: > > > > > > > > > > On Sep 7, 2022, at 4:37 PM, Anna Schumaker <anna@xxxxxxxxxx> > > > > wrote: > > > > > > > > On Wed, Sep 7, 2022 at 4:29 PM Chuck Lever III > > > > <chuck.lever@xxxxxxxxxx> wrote: > > > > > > > > > > Be sure to Cc: Jeff on these. Thanks! > > > > > > > > > > > > > > > > On Sep 7, 2022, at 3:52 PM, Anna Schumaker > > > > > > <anna@xxxxxxxxxx> > > > > > > wrote: > > > > > > > > > > > > From: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> > > > > > > > > > > > > Chuck had suggested reverting READ_PLUS so it returns a > > > > > > single > > > > > > DATA > > > > > > segment covering the requested read range. This prepares > > > > > > the > > > > > > server for > > > > > > a future "sparse read" function so support can easily be > > > > > > added > > > > > > without > > > > > > needing to rip out the old READ_PLUS code at the same time. > > > > > > > > > > > > Signed-off-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> > > > > > > --- > > > > > > fs/nfsd/nfs4xdr.c | 139 +++++++++++------------------------ > > > > > > ---- > > > > > > ------- > > > > > > 1 file changed, 32 insertions(+), 107 deletions(-) > > > > > > > > > > > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > > > > > > index 1e9690a061ec..bcc8c385faf2 100644 > > > > > > --- a/fs/nfsd/nfs4xdr.c > > > > > > +++ b/fs/nfsd/nfs4xdr.c > > > > > > @@ -4731,79 +4731,37 @@ nfsd4_encode_offload_status(struct > > > > > > nfsd4_compoundres *resp, __be32 nfserr, > > > > > > > > > > > > static __be32 > > > > > > nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp, > > > > > > - struct nfsd4_read *read, > > > > > > - unsigned long *maxcount, u32 > > > > > > *eof, > > > > > > - loff_t *pos) > > > > > > + struct nfsd4_read *read) > > > > > > { > > > > > > - struct xdr_stream *xdr = resp->xdr; > > > > > > + bool splice_ok = test_bit(RQ_SPLICE_OK, &resp->rqstp- > > > > > > > rq_flags); > > > > > > struct file *file = read->rd_nf->nf_file; > > > > > > - int starting_len = xdr->buf->len; > > > > > > - loff_t hole_pos; > > > > > > - __be32 nfserr; > > > > > > - __be32 *p, tmp; > > > > > > - __be64 tmp64; > > > > > > - > > > > > > - hole_pos = pos ? *pos : vfs_llseek(file, read- > > > > > > >rd_offset, > > > > > > SEEK_HOLE); > > > > > > - if (hole_pos > read->rd_offset) > > > > > > - *maxcount = min_t(unsigned long, *maxcount, > > > > > > hole_pos - read->rd_offset); > > > > > > - *maxcount = min_t(unsigned long, *maxcount, (xdr- > > > > > > >buf- > > > > > > > buflen - xdr->buf->len)); > > > > > > + struct xdr_stream *xdr = resp->xdr; > > > > > > + unsigned long maxcount; > > > > > > + __be32 nfserr, *p; > > > > > > > > > > > > /* Content type, offset, byte count */ > > > > > > p = xdr_reserve_space(xdr, 4 + 8 + 4); > > > > > > if (!p) > > > > > > - return nfserr_resource; > > > > > > + return nfserr_io; > > > > > > > > > > Wouldn't nfserr_rep_too_big be a more appropriate status for > > > > > running > > > > > off the end of the send buffer? I'm not 100% sure, but I > > > > > would > > > > > expect > > > > > that exhausting send buffer space would imply the reply has > > > > > grown > > > > > too > > > > > large. > > > > > > > > I can switch it to that, no problem. > > > > > > I would like to hear opinions from protocol experts before we go > > > with that choice. > > > > I'd agree that NFS4ERR_REP_TOO_BIG is correct if you're not able to > > even return a short read. However if you can return a short read, > > then > > that's better than returning an error. > > Many thanks for reviewing! > > In general, I might want to convert all NFSv4 encoders to return > either RESOURCE or REP_TOO_BIG when xdr_reserve_space() fails. I > can post a patch or two that makes that attempt so the special > cases can be sorted on the mailing list. > > > > It looks to me as if this function can bit hit in both cases, so > > perhaps some care is in order. > > Intriguing idea. > > For READ, if xdr_reserve_space() fails, that's all she wrote. > I think all these calls happen before the data payload is encoded. > > For READ_PLUS, if xdr_reserve_space() fails, it might be possible > to use xdr_truncate_encode() or simply start over with a limit on > the number of segments to be encoded. We're not really there yet, > as currently we want to return just a single segment at this > point. > > I feel the question is whether it's practical or a frequent > enough occurrence to bother with the special cases. Encoding a > READ_PLUS response is already challenging. What I'm saying is that you are more likely to hit this issue when you have a reply with something like "<data><hole><data>"... If the second "<data>" chunk hits the above xdr_reserve_space() limit (i.e. the first call to nfsd4_encode_read_plus_data() succeeds as does the call to nfsd4_encode_read_plus_hole()), then you just want to return a short read of the form "<data><hole>" instead of returning an error on the whole READ_PLUS operation. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx