Re: [PATCH v2 19/47] nfsd41: DRC save, restore, and clear functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 01, 2009 at 02:23:11PM -0400, Andy Adamson wrote:
>
> On Mar 31, 2009, at 7:03 PM, J. Bruce Fields wrote:
>
>> This one scares me.
>>
>> On Sat, Mar 28, 2009 at 11:32:32AM +0300, Benny Halevy wrote:
>>> From: Andy Adamson <andros@xxxxxxxxxx>
>>>
>>> Cache all the result pages, including the rpc header in  
>>> rq_respages[0],
>>> for a request in the slot table cache entry.
>>>
>>> Cache the statp pointer from nfsd_dispatch which points into  
>>> rq_respages[0]
>>> just past the rpc header. When setting a cache entry, calculate and  
>>> save the
>>> length of the nfs data minus the rpc header for rq_respages[0].
>>>
>>> When replaying a cache entry, replace the cached rpc header with the
>>> replayed request rpc result header, unless there is not enough room  
>>> in the
>>> cached results first page. In that case, use the cached rpc header.
>>>
>>> The sessions fore channel maxresponse size cached is set to  
>>> NFSD_PAGES_PER_SLOT
>>> * PAGE_SIZE. For compounds we are cacheing with operations such as  
>>> READDIR
>>> that use the xdr_buf->pages to hold data, we choose to cache the  
>>> extra page of
>>> data rather than copying data from xdr_buf->pages into the xdr_buf- 
>>> >head page.
>>>
>>> [nfsd41: limit cache to maxresponsesize_cached]
>>> Signed-off-by: Andy Adamson <andros@xxxxxxxxxx>
>>> Signed-off-by: Benny Halevy <bhalevy@xxxxxxxxxxx>
>>> [nfsd41: mv nfsd4_set_statp under CONFIG_NFSD_V4_1]
>>> Signed-off-by: Andy Adamson <andros@xxxxxxxxxx>
>>> Signed-off-by: Benny Halevy <bhalevy@xxxxxxxxxxx>
>>> ---
>>> fs/nfsd/nfs4state.c        |  142 ++++++++++++++++++++++++++++++++++ 
>>> ++++++++++
>>> fs/nfsd/nfssvc.c           |    4 +
>>> include/linux/nfsd/cache.h |    5 ++
>>> include/linux/nfsd/state.h |   13 ++++
>>> include/linux/nfsd/xdr4.h  |    4 +
>>> 5 files changed, 168 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>> index 10eb67b..f0ce639 100644
>>> --- a/fs/nfsd/nfs4state.c
>>> +++ b/fs/nfsd/nfs4state.c
>>> @@ -860,6 +860,148 @@ out_err:
>>> }
>>>
>>> #if defined(CONFIG_NFSD_V4_1)
>>> +void
>>> +nfsd4_set_statp(struct svc_rqst *rqstp, __be32 *statp)
>>> +{
>>> +	struct nfsd4_compoundres *resp = rqstp->rq_resp;
>>> +
>>> +	resp->cstate.statp = statp;
>>> +}
>>> +
>>> +/*
>>> + * Dereference the result pages.
>>> + */
>>> +static void
>>> +nfsd4_release_respages(struct page **respages, short resused)
>>> +{
>>> +	int page_no;
>>> +
>>> +	dprintk("--> %s\n", __func__);
>>> +	for (page_no = 0; page_no < resused; page_no++) {
>>> +		if (!respages[page_no])
>>> +			continue;
>>> +		put_page(respages[page_no]);
>>> +		respages[page_no] = NULL;
>>> +	}
>>> +}
>>> +
>>> +static void
>>> +nfsd4_move_pages(struct page **topages, struct page **frompages,  
>>> short count)
>>
>> s/move/copy/; we're not removing anything from the source.
>>
>>> +{
>>> +	int page_no;
>>
>> As a general matter of style, I'd rather any loop variable in a  
>> function
>> this short and simple be named "i".  "j" if you need another....
>>
>>> +
>>> +	for (page_no = 0; page_no < count; page_no++) {
>>> +		topages[page_no] = frompages[page_no];
>>> +		if (!topages[page_no])
>>> +			continue;
>>> +		get_page(topages[page_no]);
>>> +	}
>>> +}
>>> +
>>> +/*
>>> + * Cache the reply pages up to NFSD_PAGES_PER_SLOT + 1, clearing  
>>> the previous
>>> + * pages. We add a page to NFSD_PAGES_PER_SLOT for the case where  
>>> the total
>>> + * length of the XDR response is less than se_fmaxresp_cached
>>> + * (NFSD_PAGES_PER_SLOT * PAGE_SIZE) but the xdr_buf pages is used  
>>> for a
>>> + * of the reply (e.g. readdir).
>>
>> That comment isn't very clear.
>>
>> Is one page really sufficient?  Consider, for example, a 2-byte read
>> which spans a page boundary:
>>
>> 	first page: rpc header, compound header, putfh reply, etc.
>> 	second page: 1st byte of read data
>> 	third page: 2nd byte of read data
>> 	fourth page: 2 bytes of padding, rest of reply.
>>
>> That's for a reply of total length less than a page.
>
> I didn't realize the VFS returned read data in this manner. I thought a 2 
> byte read would end up as the first two bytes in the first page of the 
> iovec presented to vfs_readv. Does the server actually send 4 pages of 
> data for a two byte read??

Whoops, sorry, I forgot this question--email's coming out of my ears.

In nfsd_vfs_read() there are two cases, one using splice, and one readv.
Trace through it and see that in the splice case nfsd_splice_actor() is
called, which just takes references to pages in the page cache and sets
page_base and page_len appropriately.  The kvec passed into
nfsd4_encode_read isn't used in that case.

So, yes, if you do a 2-byte read from offset 1023 into a file (for
example), I believe you get a case like the above.

And the splice case is important to performance--we shouldn't just
bypass it for 4.1.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux