Re: nfsd: managing pages under network I/O

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Oct 19, 2016, at 1:26 PM, bfields@xxxxxxxxxxxx wrote:
> 
> On Wed, Oct 12, 2016 at 01:42:26PM -0400, Chuck Lever wrote:
>> I'm studying the way that the ->recvfrom and ->sendto calls work
>> for RPC-over-RDMA.
>> 
>> The ->sendto path moves pages out of the svc_rqst before posting
>> I/O (RDMA Write and Send). Once the work is posted, ->sendto
>> returns, and looks like svc_rqst is released at that point. The
>> subsequent completion of the Send then releases those moved pages.
>> 
>> I'm wondering if the transport can be simplified: instead of
>> moving pages around, ->sendto could just wait until the Write and
>> Send activity is complete, then return. The upper layer then
>> releases everything.
> 
> I don't understand what problem you're trying to fix.  Is "moving pages
> around" really especially complicated or expensive?

Moving pages is complicated and undocumented, and adds host CPU and
memory costs (loads and stores). There is also a whole lot of page
allocator traffic in this code, and that will be enabling and
disabling interrupts and bottom-halves with some frequency.

The less involvement by the host CPU the better. The current RDMA
transport code documents the RPC-over-RDMA protocol to some extent,
but the need for moving the pages is to the reader's imagination.

>From my code audit:

The svc_rqst is essentially a static piece of memory, and the calling
NFSD kthread will re-use it immediately when svc_sendto() returns. It
cannot be manipulated asynchronously. The recvfrom calls and the
sendto call for one RPC request can each use a different svc_rqst,
for example. So if the transport needs to use pages for longer the
length of one svc_sendto call (for example, to wait for RDMA Write to
complete) then it has to move those pages out of the svc_rqst before
it returns. (Let me know if I've got this wrong).

I'm falling back to looking at ways to clean up and document the
movement of these pages so it is better understood why it is done.
There also could be some opportunity to share page movement helpers
between the RDMA read and write path (and perhaps the TCP transport
too).

HTH.


> --b.
> 
>> 
>> Another option would be for ->sendto to return a value that means
>> the transport will release the svc_rqst and pages.
>> 
>> Or, the svc_rqst could be reference counted.
>> 
>> Anyone have thoughts about this?


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux