Re: [PATCH v1 19/42] SUNRPC: Fix xdr_get_next_encode_buffer() page boundary handling

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 3 Mar 2021 18:19:33 +0000

> On Mar 3, 2021, at 11:52 AM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> 
> On Wed, Mar 03, 2021 at 03:43:28PM +0000, Chuck Lever wrote:
>> Hi Bruce-
>> 
>> Thanks for your careful review of this series!
>> 
>> 
>>> On Mar 2, 2021, at 5:11 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>>> 
>>> On Mon, Mar 01, 2021 at 10:17:13AM -0500, Chuck Lever wrote:
>>>> The description of commit 2825a7f90753 ("nfsd4: allow encoding
>>>> across page boundaries") states:
>>>> 
>>>>> Also we can't handle a new operation starting close to the end of
>>>>> a page.
>>>> 
>>>> But does not detail why this is the case.
>>> 
>>> That wasn't every helpful of me, sorry.
>>> 
>>>> Subtracting the scratch buffer's "shift" value from the remaining
>>>> stream space seems to make reserving space close to the end of the
>>>> buf->pages array reliable.
>>> 
>>> So, why is that?
>>> 
>>> Thinking this through:
>>> 
>>> When somebody asks for a buffer that would straddle a page boundary,
>>> with frag1bytes at the end of this page and frag2bytes at the start of
>>> the next page, we instead give them a buffer starting at start of the
>>> next page.  That gives them a nice contiguous buffer to write into.
>>> When they're done using it, we fix things up by copying what they wrote
>>> back to where it should be.
>>> 
>>> That means we're temporarily wasting frag1bytes of space.  So, I don't
>>> know, maybe that's the logic behind subtracing frag1bytes from
>>> space_left.
>>> 
>>> It means you may end up with xdr->end frag1bytes short of the next page.
> 
> Wait, let me try that again:
> 
> 	p = page_address(*xdr->page_ptr);
> 	xdr->p = (void *)p + frag2bytes;
> 	space_left = xdr->buf->buflen - xdr->buf->len - frag1bytes;
>        xdr->end = (void *)p + min_t(int, space_left, PAGE_SIZE);
> 
> If you've still got a lot of buffer space left, then that'll put
> xdr->end frag2bytes past the end of a page, won't it?

So far I haven't heard any complaints from the kernel. I'll
instrument the code and see what comes up.

>>> I'm not sure that's right.
>> 
>> Why would that not be OK? the next call to xdr_get_next_encode_buffer()
>> should do the right thing and bounce the new encoded data from the
>> next page into this one again.
>> 
>> So far I have not encountered any problems. Would such a problem show
>> up with some frequency under normal use, or would it be especially
>> subtle?
> 
> I mainly just want to make sure we've got a coherent idea what this code
> is doing....

Agreed, that's a good thing.

> For testing: large replies that aren't just read data are readdir and
> getacl.  So reading large directories with lots of variably-named files
> might be good. Also pynfs could easily send a single compound with lots
> of variable-sized reads, that might be interesting.

I've run the full git regression suite over NFSv3 many many times with
this patch applied. That includes unpacking via tar, a full build from
scratch, and then running thousands of individual tests.

So that doesn't exercise a particular corner case, but it does reflect
a broad variety of directory operations.

> Constructing a compound that will result in xdr_reserve_space calls that
> exactly hit the various corner cases may be hard.  But maybe if we just
> send a bunch of compounds that vary over some range we can still
> guarantee hitting those cases.
> 
> --b.

--
Chuck Lever