On Thu, Aug 19, 2010 at 6:35 PM, Venkateswararao Jujjuri (JV) <jvrao@xxxxxxxxxxxxxxxxxx> wrote: > Eric Van Hensbergen wrote: >> Yeah, I haven't done a thorough critique of the code yet as I'm >> overloaded with other things, but my overall impression is similar to >> lucho's. Last time I looked at doing this my approach was to keep >> everything zero copy in the fs and client layer and if I needed to >> allocate/copy in the transport because it didn't support >> scatter/gather I did it in the transport. >> >> Of course, this touches more than just the virtio transport and so I >> can understand the trepidation at broader changes, but it seems like >> we are doing backflips to try and avoid broader changes and the result >> is somewhat awkward due the overloaded nature of many of the fields -- >> in the end it is likely to lead to more problems. > > I understand that it is not very pretty.. :) but the main motto was modular and > limited change. > Unless we separate the header from payload completely at the protocol level, > I think it may be difficult to achieve zero copy cleanly. > I somewhat disagree - the point is to separate the header from payload from the API perspective -- it has little to do with the protocol. While keeping things modular and limited change is a good goal, there are certain things that are worth rethinking when we start jumping through too many hoops. I think zero-copy is something we need to try and strive for in every transport where its possible, and perhaps it should be possible in all of them -- so its worth doing right. This is why I was holding off on optimizations until after we got the basics of virtfs and .L in place, because I knew they would involve broader changes that would impact more of the code base and so would take more time to crack. > > I will be more than happy to take your suggestions on how to get it right cleanly. > Increasing msize in the current model really doesn't solve the issue, as it forces > us to bigger kmalloc()s even for simple transactions like TLCREATE/RLCREATE. > In situations where we are using zero copy it may be possible to make the per-packet allocated buffer IOHDRSIZE. Its something thats worth looking at carefully - IMHO it doesn't make sense to allocate msize buffers if we are passing payloads by reference most of the time. > > Another area where we need to look at is, the same structure, 9p_fcall is being > used for > both transmission and reception. Most of the times only one way will need larger > memory, > not the other way. > This is almost definitely true, but it depends on what we are treating as payload. Essentially, we only need something other than IOHDRSIZE for a subset of operations (read, write, stat, and some of the new ones in .L) -- if those operations also use sg-lists then we should be able to do a better job. -eric > >> >> On Thu, Aug 19, 2010 at 4:07 PM, Latchesar Ionkov <lucho@xxxxxxxxxx> wrote: >>> Frankly, I don't like the way the zero copy is implemented much. >>> >>> 1. In case of Twrite, the size[4] field of the 9P message contains the >>> size of the message without the data field. >>> >>> 2. In case of Tread message, the pages to receive the data are >>> contained in req->tc instead of req->rc. > > We had some debate internally, on this before choosing req->tc. :) > The argument being, whatever client sends, it puts it on TC, and server responds > back on RC. > But I do understand your point too. > > Eric/Lucho, > > Please let me know if you have any suggestions to get this right. > > Thanks, > JV > >>> >>> What is the point of having the page data in p9_fcall, if only the one >>> in req->tc is ever going to be used? I am trying to implement a >>> transport that uses your patches, I have to fill my code with >>> explanations why I am doing crazy things. >>> >>> Thanks, >>> Lucho >>> >>> On Thu, Aug 19, 2010 at 2:47 PM, Venkateswararao Jujjuri (JV) >>> <jvrao@xxxxxxxxxxxxxxxxxx> wrote: >>>> Latchesar Ionkov wrote: >>>>> It is kind of strange that part of the fcall packet (everything other >>>>> than the Rread/Twrite data) is located in a buffer pointed by sdata, >>>>> and the rest is in pages pointed by pdata. A scatterlist would make it >>>>> tidier and easier to figure out what is where. >>>> A separate sg list will further increase the size of PDU. Initially I had a sg list >>>> sitting in the fcall structure. But when we are using the page address directly, >>>> we are not making use of sdata completely ...only initial portion is used for >>>> the header. >>>> To make use of kernel memory efficiently, we came up with this plan of overloading >>>> sdata with page pointers during the user initiated Rread/Twrite calls. >>>> >>>> The major advantage we saw with this method is, changes are very modular. >>>> Other parts of the code, and other transports work without a change. >>>> >>>> If needed, we can easily have a separate sg list vector in the fcall, but it may >>>> not be using >>>> the kernel memory efficiently as we have the whole sdata allocated but not being >>>> used. >>>> >>>> Thanks, >>>> JV >>>> >>>>> Thanks, >>>>> Lucho >>>>> >>>>> On Thu, Aug 19, 2010 at 12:28 PM, Venkateswararao Jujjuri (JV) >>>>> <jvrao@xxxxxxxxxxxxxxxxxx> wrote: >>>>>> Latchesar Ionkov wrote: >>>>>>> Is there any particular reason p9_fcall to have pointers to pages >>>>>>> instead of a scatterlist? >>>>>> Given that page sizes are constant, all we need is offset into the first page. >>>>>> IO size determines the last page. So we decided no need to put sg list in the >>>>>> p9_fcall. >>>>>> >>>>>> Thanks, >>>>>> JV >>>>>> >>>>>>> Thanks, >>>>>>> Lucho >>>>>>> >>>>>>> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV) >>>>>>> <jvrao@xxxxxxxxxxxxxxxxxx> wrote: >>>>>>>> This patch adds necessary infrastructure for placing page addresses >>>>>>>> directly on the sg list for the server to consume. >>>>>>>> >>>>>>>> The newly added routine pack_sg_list_p() is just like pack_sg_list() >>>>>>>> except that it takes page array as an input and directly places them on >>>>>>>> the sg list after taking care of the first page offset. >>>>>>>> >>>>>>>> Signed-off-by: Venkateswararao Jujjuri <jvrao@xxxxxxxxxxxxxxxxxx> >>>>>>>> Signed-off-by: Badari Pulavarty <pbadari@xxxxxxxxxx> >>>>>>>> --- >>>>>>>> include/net/9p/9p.h | 6 ++++- >>>>>>>> net/9p/client.c | 4 +++ >>>>>>>> net/9p/trans_virtio.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++-- >>>>>>>> 3 files changed, 65 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h >>>>>>>> index a8de812..382ef22 100644 >>>>>>>> --- a/include/net/9p/9p.h >>>>>>>> +++ b/include/net/9p/9p.h >>>>>>>> @@ -651,7 +651,11 @@ struct p9_fcall { >>>>>>>> >>>>>>>> size_t offset; >>>>>>>> size_t capacity; >>>>>>>> - >>>>>>>> + struct page **pdata; >>>>>>>> + uint32_t pdata_mapped_pages; >>>>>>>> + uint32_t pdata_off; >>>>>>>> + uint32_t pdata_write_len; >>>>>>>> + uint32_t pdata_read_len; >>>>>>>> uint8_t *sdata; >>>>>>>> }; >>>>>>>> >>>>>>>> diff --git a/net/9p/client.c b/net/9p/client.c >>>>>>>> index 29bbbbd..5487896 100644 >>>>>>>> --- a/net/9p/client.c >>>>>>>> +++ b/net/9p/client.c >>>>>>>> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag) >>>>>>>> } >>>>>>>> req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall); >>>>>>>> req->tc->capacity = c->msize; >>>>>>>> + req->tc->pdata_write_len = 0; >>>>>>>> + req->tc->pdata_read_len = 0; >>>>>>>> req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall); >>>>>>>> req->rc->capacity = c->msize; >>>>>>>> + req->rc->pdata_write_len = 0; >>>>>>>> + req->rc->pdata_read_len = 0; >>>>>>>> } >>>>>>>> >>>>>>>> p9pdu_reset(req->tc); >>>>>>>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c >>>>>>>> index 762c19f..8f86cb5 100644 >>>>>>>> --- a/net/9p/trans_virtio.c >>>>>>>> +++ b/net/9p/trans_virtio.c >>>>>>>> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data, >>>>>>>> return index-start; >>>>>>>> } >>>>>>>> >>>>>>>> +/** >>>>>>>> + * pack_sg_list_p - Pack a scatter gather list from an array of pages. >>>>>>>> + * @sg: scatter/gather list to pack into >>>>>>>> + * @start: which segment of the sg_list to start at >>>>>>>> + * @limit: maximum segment to pack data to >>>>>>>> + * @pdu: pdu prepared to put on the wire. >>>>>>>> + * @count: amount of data to pack into the scatter/gather list >>>>>>>> + * >>>>>>>> + * This is just like pack_sg_list() except that it takes page array >>>>>>>> + * as an input and directly places them on the sg list after taking >>>>>>>> + * care of the first page offset. >>>>>>>> + */ >>>>>>>> + >>>>>>>> +static int >>>>>>>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit, >>>>>>>> + struct p9_fcall *pdu, int count) >>>>>>>> +{ >>>>>>>> + int s; >>>>>>>> + int i = 0; >>>>>>>> + int index = start; >>>>>>>> + >>>>>>>> + if (pdu->pdata_off) { >>>>>>>> + s = min((int)(PAGE_SIZE - pdu->pdata_off), count); >>>>>>>> + sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off); >>>>>>>> + count -= s; >>>>>>>> + } >>>>>>>> + >>>>>>>> + while (count) { >>>>>>>> + BUG_ON(index > limit); >>>>>>>> + s = min((int)PAGE_SIZE, count); >>>>>>>> + sg_set_page(&sg[index++], pdu->pdata[i++], s, 0); >>>>>>>> + count -= s; >>>>>>>> + } >>>>>>>> + >>>>>>>> + return index-start; >>>>>>>> +} >>>>>>>> + >>>>>>>> + >>>>>>>> /* We don't currently allow canceling of virtio requests */ >>>>>>>> static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req) >>>>>>>> { >>>>>>>> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req) >>>>>>>> static int >>>>>>>> p9_virtio_request(struct p9_client *client, struct p9_req_t *req) >>>>>>>> { >>>>>>>> - int in, out; >>>>>>>> + int in, out, outp, inp; >>>>>>>> struct virtio_chan *chan = client->trans; >>>>>>>> char *rdata = (char *)req->rc+sizeof(struct p9_fcall); >>>>>>>> >>>>>>>> P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n"); >>>>>>>> >>>>>>>> out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata, >>>>>>>> - req->tc->size); >>>>>>>> - in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata, >>>>>>>> + req->tc->size); >>>>>>>> + >>>>>>>> + BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len); >>>>>>>> + >>>>>>>> + if (req->tc->pdata_write_len) { >>>>>>>> + outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM, >>>>>>>> + req->tc, req->tc->pdata_write_len); >>>>>>>> + out += outp; >>>>>>>> + } >>>>>>>> + if (req->tc->pdata_read_len) { >>>>>>>> + inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11); >>>>>>>> + in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM, >>>>>>>> + req->tc, req->tc->pdata_read_len); >>>>>>>> + in += inp; >>>>>>>> + } else { >>>>>>>> + in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, >>>>>>>> client->msize); >>>>>>>> + } >>>>>>>> >>>>>>>> req->status = REQ_STATUS_SENT; >>>>>>>> >>>>>>>> -- >>>>>>>> 1.6.5.2 >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> This SF.net email is sponsored by >>>>>>>> >>>>>>>> Make an app they can't live without >>>>>>>> Enter the BlackBerry Developer Challenge >>>>>>>> http://p.sf.net/sfu/RIM-dev2dev >>>>>>>> _______________________________________________ >>>>>>>> V9fs-developer mailing list >>>>>>>> V9fs-developer@xxxxxxxxxxxxxxxxxxxxx >>>>>>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer >>>>>>>> >>>>>> >>>> >>>> >>> ------------------------------------------------------------------------------ >>> This SF.net email is sponsored by >>> >>> Make an app they can't live without >>> Enter the BlackBerry Developer Challenge >>> http://p.sf.net/sfu/RIM-dev2dev >>> _______________________________________________ >>> V9fs-developer mailing list >>> V9fs-developer@xxxxxxxxxxxxxxxxxxxxx >>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer >>> > > > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html