Re: [PATCH v1] NFS: Fix handling of reply page vector

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 8 Apr 2019 17:29:43 -0400

> On Apr 8, 2019, at 5:23 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Mon, 2019-04-08 at 16:58 -0400, Chuck Lever wrote:
>>> On Apr 8, 2019, at 4:38 PM, Trond Myklebust <
>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>> 
>>> On Mon, 2019-04-08 at 16:00 -0400, Chuck Lever wrote:
>>>> NFSv4 GETACL and FS_LOCATIONS requests stopped working in v5.1-
>>>> rc.
>>>> 
>>>> These two need the extra padding to be added directly to the
>>>> reply
>>>> length.
>>>> 
>>>> Reported-by: Olga Kornievskaia <aglo@xxxxxxxxx>
>>>> Fixes: 02ef04e432ba ("NFS: Account for XDR pad of buf->pages")
>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>> ---
>>>> fs/nfs/nfs4xdr.c |    4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
>>>> index cfcabc3..6024461 100644
>>>> --- a/fs/nfs/nfs4xdr.c
>>>> +++ b/fs/nfs/nfs4xdr.c
>>>> @@ -2589,7 +2589,7 @@ static void nfs4_xdr_enc_getacl(struct
>>>> rpc_rqst
>>>> *req, struct xdr_stream *xdr,
>>>> 			ARRAY_SIZE(nfs4_acl_bitmap), &hdr);
>>>> 
>>>> 	rpc_prepare_reply_pages(req, args->acl_pages, 0,
>>>> -				args->acl_len, replen);
>>>> +				args->acl_len, replen + 1);
>>>> 	encode_nops(&hdr);
>>>> }
>>>> 
>>>> @@ -2811,7 +2811,7 @@ static void
>>>> nfs4_xdr_enc_fs_locations(struct
>>>> rpc_rqst *req,
>>>> 	}
>>>> 
>>>> 	rpc_prepare_reply_pages(req, (struct page **)&args->page, 0,
>>>> -				PAGE_SIZE, replen);
>>>> +				PAGE_SIZE, replen + 1);
>>>> 	encode_nops(&hdr);
>>>> }
>>>> 
>>> 
>>> I'm having trouble with the math here. Why are we pre-emptively
>>> subtracting a word from the tail? The header constants are always
>>> 4-bit 
>>> aligned because they are calculated as a word count, so I'm not
>>> understanding why we need commit 02ef04e432ba at all.
>>> 
>>> Can you please explain, Chuck?
>> 
>> The goal is to allocate a reply buffer that is just large enough
>> to fit the expected reply, and ensure that the variable-length
>> payload will start exactly where the xdr_buf's pages begin.
>> 
>> In cases where the payload length is not aligned to four bytes,
>> an extra quad has to be allocated in the tail. So, the total
>> reply length is increased by one quad so there is enough space
>> for the XDR pad bytes in the tail.
>> 
> Right, but we should never hit that problem because the proc->p_arglen
> and proc->p_replen are always in units of 32-bit words.

But buf->pages aren't. The point is to accommodate XDR padding
of the data payload, which goes into the page cache. You want
just the data to go into the page cache, and the padding to
land in the tail.

> IOW: the functions that allocate memory, will always do so in full
> words, hence it should not be necessary for xdr_inline_pages() to
> adjust that allocation.
> 
> The one thing that we _might_ want to do if we're to do anything at all
> is to perhaps adjust tail->iov_base by (xdr->page_len & 3) bytes to
> ensure that we have word-aligned data in the tail.
> 
> i.e. capture the padding in the remaining bytes in that first word, so
> that xdr_read_pages() sets word aligned values for xdr->p and xdr->end.

I'm in favor of hiding the details of that in the generic
XDR code, and I probably should have taken that step in
02ef04e432ba.

So I think we should add a quad to the tail whenever the
expected size of the data payload (ie, payload that goes into
the page cache) is going to be non-word aligned. Otherwise
the tail doesn't need that space. In those cases, the tail
can be eliminated.

The reason to do that is that then the RDMA transport does
not need to register a 4-byte memory region for the XDR
padding.

--
Chuck Lever