Re: [pnfs] [PATCH v2 19/47] nfsd41: DRC save, restore, and clear functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 2, 2009 at 5:16 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> On Wed, Apr 01, 2009 at 02:23:11PM -0400, Andy Adamson wrote:
>>
>> On Mar 31, 2009, at 7:03 PM, J. Bruce Fields wrote:
>>
>>> This one scares me.
>>>
>>> On Sat, Mar 28, 2009 at 11:32:32AM +0300, Benny Halevy wrote:
>>>> From: Andy Adamson <andros@xxxxxxxxxx>
>>>>
>>>> Cache all the result pages, including the rpc header in
>>>> rq_respages[0],
>>>> for a request in the slot table cache entry.
>>>>
>>>> Cache the statp pointer from nfsd_dispatch which points into
>>>> rq_respages[0]
>>>> just past the rpc header. When setting a cache entry, calculate and
>>>> save the
>>>> length of the nfs data minus the rpc header for rq_respages[0].
>>>>
>>>> When replaying a cache entry, replace the cached rpc header with the
>>>> replayed request rpc result header, unless there is not enough room
>>>> in the
>>>> cached results first page. In that case, use the cached rpc header.
>>>>
>>>> The sessions fore channel maxresponse size cached is set to
>>>> NFSD_PAGES_PER_SLOT
>>>> * PAGE_SIZE. For compounds we are cacheing with operations such as
>>>> READDIR
>>>> that use the xdr_buf->pages to hold data, we choose to cache the
>>>> extra page of
>>>> data rather than copying data from xdr_buf->pages into the xdr_buf-
>>>> >head page.
>>>>
>>>> [nfsd41: limit cache to maxresponsesize_cached]
>>>> Signed-off-by: Andy Adamson <andros@xxxxxxxxxx>
>>>> Signed-off-by: Benny Halevy <bhalevy@xxxxxxxxxxx>
>>>> [nfsd41: mv nfsd4_set_statp under CONFIG_NFSD_V4_1]
>>>> Signed-off-by: Andy Adamson <andros@xxxxxxxxxx>
>>>> Signed-off-by: Benny Halevy <bhalevy@xxxxxxxxxxx>
>>>> ---
>>>> fs/nfsd/nfs4state.c        |  142 ++++++++++++++++++++++++++++++++++
>>>> ++++++++++
>>>> fs/nfsd/nfssvc.c           |    4 +
>>>> include/linux/nfsd/cache.h |    5 ++
>>>> include/linux/nfsd/state.h |   13 ++++
>>>> include/linux/nfsd/xdr4.h  |    4 +
>>>> 5 files changed, 168 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>>> index 10eb67b..f0ce639 100644
>>>> --- a/fs/nfsd/nfs4state.c
>>>> +++ b/fs/nfsd/nfs4state.c
>>>> @@ -860,6 +860,148 @@ out_err:
>>>> }
>>>>
>>>> #if defined(CONFIG_NFSD_V4_1)
>>>> +void
>>>> +nfsd4_set_statp(struct svc_rqst *rqstp, __be32 *statp)
>>>> +{
>>>> +   struct nfsd4_compoundres *resp = rqstp->rq_resp;
>>>> +
>>>> +   resp->cstate.statp = statp;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Dereference the result pages.
>>>> + */
>>>> +static void
>>>> +nfsd4_release_respages(struct page **respages, short resused)
>>>> +{
>>>> +   int page_no;
>>>> +
>>>> +   dprintk("--> %s\n", __func__);
>>>> +   for (page_no = 0; page_no < resused; page_no++) {
>>>> +           if (!respages[page_no])
>>>> +                   continue;
>>>> +           put_page(respages[page_no]);
>>>> +           respages[page_no] = NULL;
>>>> +   }
>>>> +}
>>>> +
>>>> +static void
>>>> +nfsd4_move_pages(struct page **topages, struct page **frompages,
>>>> short count)
>>>
>>> s/move/copy/; we're not removing anything from the source.
>>>
>>>> +{
>>>> +   int page_no;
>>>
>>> As a general matter of style, I'd rather any loop variable in a
>>> function
>>> this short and simple be named "i".  "j" if you need another....
>>>
>>>> +
>>>> +   for (page_no = 0; page_no < count; page_no++) {
>>>> +           topages[page_no] = frompages[page_no];
>>>> +           if (!topages[page_no])
>>>> +                   continue;
>>>> +           get_page(topages[page_no]);
>>>> +   }
>>>> +}
>>>> +
>>>> +/*
>>>> + * Cache the reply pages up to NFSD_PAGES_PER_SLOT + 1, clearing
>>>> the previous
>>>> + * pages. We add a page to NFSD_PAGES_PER_SLOT for the case where
>>>> the total
>>>> + * length of the XDR response is less than se_fmaxresp_cached
>>>> + * (NFSD_PAGES_PER_SLOT * PAGE_SIZE) but the xdr_buf pages is used
>>>> for a
>>>> + * of the reply (e.g. readdir).
>>>
>>> That comment isn't very clear.
>>>
>>> Is one page really sufficient?  Consider, for example, a 2-byte read
>>> which spans a page boundary:
>>>
>>>      first page: rpc header, compound header, putfh reply, etc.
>>>      second page: 1st byte of read data
>>>      third page: 2nd byte of read data
>>>      fourth page: 2 bytes of padding, rest of reply.
>>>
>>> That's for a reply of total length less than a page.
>>
>> I didn't realize the VFS returned read data in this manner. I thought a 2
>> byte read would end up as the first two bytes in the first page of the
>> iovec presented to vfs_readv. Does the server actually send 4 pages of
>> data for a two byte read??
>
> Whoops, sorry, I forgot this question--email's coming out of my ears.

That's ok!

>
> In nfsd_vfs_read() there are two cases, one using splice, and one readv.
> Trace through it and see that in the splice case nfsd_splice_actor() is
> called, which just takes references to pages in the page cache and sets
> page_base and page_len appropriately.  The kvec passed into
> nfsd4_encode_read isn't used in that case.
>
> So, yes, if you do a 2-byte read from offset 1023 into a file (for
> example), I believe you get a case like the above.
>
> And the splice case is important to performance--we shouldn't just
> bypass it for 4.1.

OK. I've come to see that I've done this bas-akwards. I'm caching the
original page with rpc header and data, then copying the new rpc
header (which may not fit AKK) into the replay page  replacing the old
rpc header.

What I should be doing is saving the encoded operations (not including
the sequence op as you suggested) and copying the cached encoded
operations into the new reply page. This will also let us get past the
nfsd_splice_actor() case 'cause we can just cache the actual data (the
two bytes in the above example).

I probably won't be able to reliably code all this tonight, although I
will give it a whirl !!

I hope that if I address the other comments and keep the current
bas-akwards code you will consider accepting it to allow me to bug fix
it in the 2.6.30-rcX series. I promise to do so.

-->Andy
>
> --b.
> _______________________________________________
> pNFS mailing list
> pNFS@xxxxxxxxxxxxx
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux