----- Original Message ----- > From: "Anand Avati" <aavati@xxxxxxxxxx> > To: "Amar Tumballi" <atumball@xxxxxxxxxx> > Cc: bharata@xxxxxxxxxxxxxxxxxx, gluster-devel@xxxxxxxxxx, "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > Sent: Thursday, January 10, 2013 12:20:09 PM > Subject: Re: zero-copy readv > > On 01/09/2013 10:37 PM, Amar Tumballi wrote: > > > >> > >> - On the read side things are a little more complicated. In > >> rpc-transport/socket, there is a call to iobuf_get() to create a > >> new > >> iobuf for reading in the readv reply data from the server. We will > >> need > >> a framework changes where, if the readv request (of the xid for > >> which > >> readv reply is being handled) happened to be a "direct" variant > >> (i.e, > >> zero-copy), then the "special iobuf around user's memory" gets > >> picked up > >> and read() from socket is performed directly into user's memory. > >> Similar, but equivalent, changes will have to be done in RDMA > >> (Raghavendra on CC can help). Since the goal is to avoid memory > >> copy, > >> this data will be bypassing io-cache (and purging pre-cached data > >> of > >> those regions along the way). > >> > > > > On the read side too, our client protocol is designed to handle > > 0-copy > > already, ie, if the fop comes with an iobuf/iobref, then the same > > buffer > > is used for copying the received data from network. > > (client_submit_request() is designed to handle this). [1] > > > > We made all these changes to make RDMA 0-copy a possibility, so > > even > > RDMA transport should be already 0-copy friendly. > > > > Thats my understanding. > > > > Regards, > > Amar > > > > [1] - recent patches to handle RPC read-ahead may involve small > > data > > copy from header to data buffer, but surely not very high. > > > > Amar - note that the current infrastructure present for 0-copy RDMA > might not be sufficient for GFAPI's 0-copy. A glfs_readv() request > from > the app can come as a vector of memory pointers (and not a contiguous > iobuf) and therefore require storing an iovec/count as well. This > might > also mean we need to exercise the scatter-gather aspects of the verbs > API. If we pass user supplied vectors as write chunks to server, it will do rdma-writes to memory regions pointed by those vectors. So, I think there are no major changes required to rdma as well. > > Avati > >