Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Sat, 22 Jan 2022 17:05:49 +0000

> On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@xxxxxxxxxxxx> wrote:
> 
> On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote:
>>> On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@xxxxxxxxxxxx> wrote:
>>> 
>>> Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the
>>> RPC read layers"), a read of 0xfff is aligned up to server rsize of
>>> 0x1000.
>>> 
>>> As a result, in a test where the server has a file of size
>>> 0x7fffffffffffffff, and the client tries to read from the offset
>>> 0x7ffffffffffff000, the read causes loff_t overflow in the server and it
>>> returns an NFS code of EINVAL to the client. The client as a result
>>> indefinitely retries the request.
>> 
>> An infinite loop in this case is a client bug.
>> 
>> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
>> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
>> 5661 permits the NFSv4 READ operation to return
>> NFS4ERR_INVAL.
>> 
>> Was the client side fix for this issue rejected?
> 
> Yeah, see Trond's response in
> 
>   https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@xxxxxxxxxxxxxxx/
> 
> So it is both a client and server bugs?

Splitting hairs, but yes there are issues on both sides
IMO. Bad behavior due to bugs on both sides is actually
not uncommon.

Trond is correct that the server is not dealing totally
correctly with the range of values in a READ request.

However, as I pointed out, the specification permits NFS
servers to return NFS[34]ERR_INVAL on READ. And in fact,
there is already code in the NFSv4 READ path that returns
INVAL, for example:

 785         if (read->rd_offset >= OFFSET_MAX)
 786                 return nfserr_inval;

I'm not sure the specifications describe precisely when
the server /must/ return INVAL, but the client needs to
be prepared to handle it reasonably. If INVAL results in
an infinite loop, then that's a client bug.

IMO changing the alignment for that case is a band-aid.
The underlying looping behavior is what is the root
problem. (So... I agree with Trond's NACK, but for
different reasons).

>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>> index 738d564ca4ce..754f4e9ff4a2 100644
>>> --- a/fs/nfsd/vfs.c
>>> +++ b/fs/nfsd/vfs.c
>>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>> 	__be32 err;
>>> 
>>> 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
>>> +
>>> +	if (unlikely(offset + *count > NFS_OFFSET_MAX))
>>> +		*count = NFS_OFFSET_MAX - offset;
>> 
>> Can @offset ever be larger than NFS_OFFSET_MAX?
> 
> We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`.
> (should it have been `>` rather?).

Don't think so, a zero-byte READ should be valid.

However it's rather interesting that it does not use
NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses
NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX?

--
Chuck Lever