Some additional comments on v2 below. We need to sort the NFS_OFFSET_MAX v. OFFSET_MAX issue before you send a v3. > On Jan 22, 2022, at 7:49 AM, Dan Aloni <dan.aloni@xxxxxxxxxxxx> wrote: > > Due to change in client 8cfb9015280d ("NFS: Always provide aligned > buffers to the RPC read layers"), a read of 0xfff is aligned up to > server rsize of 0x0fff. scripts/checkpatch.pl will complain about the way you name the commit here. It will want: Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") on the client, > As a result, in a test where the server has a file of size > 0x7fffffffffffffff, and the client tries to read from the offset > 0x7ffffffffffff000, the read causes loff_t overflow in the server and it > returns an NFS code of EINVAL to the client. The client as a result > indefinitely retries the request. > > This fixes the issue at server side by trimming reads past > NFS_OFFSET_MAX. It also adds a missing check for out of bound offset > in NFSv3. RFC 1813 section 3.3.6 does say this: >> It is possible for the server to return fewer than count >> bytes of data. If the server returns less than the count >> requested and eof set to FALSE, the client should issue >> another READ to get the remaining data. A server may >> return less data than requested under several >> circumstances. The file may have been truncated by another >> client or perhaps on the server itself, changing the file >> size from what the requesting client believes to be the >> case. This would reduce the actual amount of data >> available to the client. It is possible that the server >> may back off the transfer size and reduce the read request >> return. Server resource exhaustion may also occur >> necessitating a smaller read return. Similar language in RFC 8881 section 18.22.4: >> If the server returns a "short read" (i.e., fewer data than requested >> and eof is set to FALSE), the client should send another READ to get >> the remaining data. A server may return less data than requested >> under several circumstances. The file may have been truncated by >> another client or perhaps on the server itself, changing the file >> size from what the requesting client believes to be the case. This >> would reduce the actual amount of data available to the client. It >> is possible that the server reduce the transfer size and so return a >> short read result. Server resource exhaustion may also occur in a >> short read. So the server could be returning INVAL and leaving EOF set to false. That might be triggering the client to retry the READ. Does the server need to set the EOF flag if the READ arguments cross the limit of the range that the server can return? Does the client need to check for this case and stop retrying? The specs aren't particularly clear on this matter. > Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") It's arguable that you are fixing 8cfb9015280d. I don't think that commit is actually broken. Also, the server behavior is wrong even before that commit, and that commit is a client change... Mentioning this commit at the top of the patch description is fine, since that is how you discovered the problem, but I'd prefer if you drop this line. The patch does warrant a Cc: stable, though. > Signed-off-by: Dan Aloni <dan.aloni@xxxxxxxxxxxx> > --- > fs/nfsd/nfs4proc.c | 3 +++ > fs/nfsd/vfs.c | 11 +++++++++++ > 2 files changed, 14 insertions(+) > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index 486c5dba4b65..3b1e395a93b6 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -788,6 +788,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > trace_nfsd_read_start(rqstp, &cstate->current_fh, > read->rd_offset, read->rd_length); > > + if (unlikely(read->rd_offset + read->rd_length > NFS_OFFSET_MAX)) > + read->rd_length = NFS_OFFSET_MAX - read->rd_offset; > + rd_offset is range-checked before the trace point, so I think this check needs to go before the trace point as well. That way the adjusted argument values are recorded. And we need to verify whether NFS_OFFSET_MAX is the correct constant for this check. > /* > * If we do a zero copy read, then a client will see read data > * that reflects the state of the file *after* performing the > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 738d564ca4ce..4a209f807466 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1046,6 +1046,16 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > __be32 err; > > trace_nfsd_read_start(rqstp, fhp, offset, *count); > + > + if (unlikely(offset > NFS_OFFSET_MAX)) { > + /* We can return this according to Section 3.3.6 */ RFC 1813 section 3.3.6 says that READ is permitted to return NFS3ERR_INVAL, but does not mandate that status code in this particular case (it's provided for general issues similar to this). So returning INVAL here is an implementation choice. Thus mentioning the spec here is IMO perhaps misleading, so I'd like you to drop this comment. > + err = nfserr_inval; > + goto error; > + } > + > + if (unlikely(offset + *count > NFS_OFFSET_MAX)) > + *count = NFS_OFFSET_MAX - offset; > + Same as above: these range-checks need to go before the trace point, IMO. > err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); > if (err) > return err; > @@ -1058,6 +1068,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > nfsd_file_put(nf); > > +error: > trace_nfsd_read_done(rqstp, fhp, offset, *count); > > return err; > -- > 2.23.0 -- Chuck Lever