Eric Van Hensbergen wrote: > Yeah, I haven't done a thorough critique of the code yet as I'm > overloaded with other things, but my overall impression is similar to > lucho's. Last time I looked at doing this my approach was to keep > everything zero copy in the fs and client layer and if I needed to > allocate/copy in the transport because it didn't support > scatter/gather I did it in the transport. > > Of course, this touches more than just the virtio transport and so I > can understand the trepidation at broader changes, but it seems like > we are doing backflips to try and avoid broader changes and the result > is somewhat awkward due the overloaded nature of many of the fields -- > in the end it is likely to lead to more problems. I understand that it is not very pretty.. :) but the main motto was modular and limited change. Unless we separate the header from payload completely at the protocol level, I think it may be difficult to achieve zero copy cleanly. I will be more than happy to take your suggestions on how to get it right cleanly. Increasing msize in the current model really doesn't solve the issue, as it forces us to bigger kmalloc()s even for simple transactions like TLCREATE/RLCREATE. Another area where we need to look at is, the same structure, 9p_fcall is being used for both transmission and reception. Most of the times only one way will need larger memory, not the other way. > > On Thu, Aug 19, 2010 at 4:07 PM, Latchesar Ionkov <lucho@xxxxxxxxxx> wrote: >> Frankly, I don't like the way the zero copy is implemented much. >> >> 1. In case of Twrite, the size[4] field of the 9P message contains the >> size of the message without the data field. >> >> 2. In case of Tread message, the pages to receive the data are >> contained in req->tc instead of req->rc. We had some debate internally, on this before choosing req->tc. :) The argument being, whatever client sends, it puts it on TC, and server responds back on RC. But I do understand your point too. Eric/Lucho, Please let me know if you have any suggestions to get this right. Thanks, JV >> >> What is the point of having the page data in p9_fcall, if only the one >> in req->tc is ever going to be used? I am trying to implement a >> transport that uses your patches, I have to fill my code with >> explanations why I am doing crazy things. >> >> Thanks, >> Lucho >> >> On Thu, Aug 19, 2010 at 2:47 PM, Venkateswararao Jujjuri (JV) >> <jvrao@xxxxxxxxxxxxxxxxxx> wrote: >>> Latchesar Ionkov wrote: >>>> It is kind of strange that part of the fcall packet (everything other >>>> than the Rread/Twrite data) is located in a buffer pointed by sdata, >>>> and the rest is in pages pointed by pdata. A scatterlist would make it >>>> tidier and easier to figure out what is where. >>> A separate sg list will further increase the size of PDU. Initially I had a sg list >>> sitting in the fcall structure. But when we are using the page address directly, >>> we are not making use of sdata completely ...only initial portion is used for >>> the header. >>> To make use of kernel memory efficiently, we came up with this plan of overloading >>> sdata with page pointers during the user initiated Rread/Twrite calls. >>> >>> The major advantage we saw with this method is, changes are very modular. >>> Other parts of the code, and other transports work without a change. >>> >>> If needed, we can easily have a separate sg list vector in the fcall, but it may >>> not be using >>> the kernel memory efficiently as we have the whole sdata allocated but not being >>> used. >>> >>> Thanks, >>> JV >>> >>>> Thanks, >>>> Lucho >>>> >>>> On Thu, Aug 19, 2010 at 12:28 PM, Venkateswararao Jujjuri (JV) >>>> <jvrao@xxxxxxxxxxxxxxxxxx> wrote: >>>>> Latchesar Ionkov wrote: >>>>>> Is there any particular reason p9_fcall to have pointers to pages >>>>>> instead of a scatterlist? >>>>> Given that page sizes are constant, all we need is offset into the first page. >>>>> IO size determines the last page. So we decided no need to put sg list in the >>>>> p9_fcall. >>>>> >>>>> Thanks, >>>>> JV >>>>> >>>>>> Thanks, >>>>>> Lucho >>>>>> >>>>>> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV) >>>>>> <jvrao@xxxxxxxxxxxxxxxxxx> wrote: >>>>>>> This patch adds necessary infrastructure for placing page addresses >>>>>>> directly on the sg list for the server to consume. >>>>>>> >>>>>>> The newly added routine pack_sg_list_p() is just like pack_sg_list() >>>>>>> except that it takes page array as an input and directly places them on >>>>>>> the sg list after taking care of the first page offset. >>>>>>> >>>>>>> Signed-off-by: Venkateswararao Jujjuri <jvrao@xxxxxxxxxxxxxxxxxx> >>>>>>> Signed-off-by: Badari Pulavarty <pbadari@xxxxxxxxxx> >>>>>>> --- >>>>>>> include/net/9p/9p.h | 6 ++++- >>>>>>> net/9p/client.c | 4 +++ >>>>>>> net/9p/trans_virtio.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++-- >>>>>>> 3 files changed, 65 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h >>>>>>> index a8de812..382ef22 100644 >>>>>>> --- a/include/net/9p/9p.h >>>>>>> +++ b/include/net/9p/9p.h >>>>>>> @@ -651,7 +651,11 @@ struct p9_fcall { >>>>>>> >>>>>>> size_t offset; >>>>>>> size_t capacity; >>>>>>> - >>>>>>> + struct page **pdata; >>>>>>> + uint32_t pdata_mapped_pages; >>>>>>> + uint32_t pdata_off; >>>>>>> + uint32_t pdata_write_len; >>>>>>> + uint32_t pdata_read_len; >>>>>>> uint8_t *sdata; >>>>>>> }; >>>>>>> >>>>>>> diff --git a/net/9p/client.c b/net/9p/client.c >>>>>>> index 29bbbbd..5487896 100644 >>>>>>> --- a/net/9p/client.c >>>>>>> +++ b/net/9p/client.c >>>>>>> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag) >>>>>>> } >>>>>>> req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall); >>>>>>> req->tc->capacity = c->msize; >>>>>>> + req->tc->pdata_write_len = 0; >>>>>>> + req->tc->pdata_read_len = 0; >>>>>>> req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall); >>>>>>> req->rc->capacity = c->msize; >>>>>>> + req->rc->pdata_write_len = 0; >>>>>>> + req->rc->pdata_read_len = 0; >>>>>>> } >>>>>>> >>>>>>> p9pdu_reset(req->tc); >>>>>>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c >>>>>>> index 762c19f..8f86cb5 100644 >>>>>>> --- a/net/9p/trans_virtio.c >>>>>>> +++ b/net/9p/trans_virtio.c >>>>>>> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data, >>>>>>> return index-start; >>>>>>> } >>>>>>> >>>>>>> +/** >>>>>>> + * pack_sg_list_p - Pack a scatter gather list from an array of pages. >>>>>>> + * @sg: scatter/gather list to pack into >>>>>>> + * @start: which segment of the sg_list to start at >>>>>>> + * @limit: maximum segment to pack data to >>>>>>> + * @pdu: pdu prepared to put on the wire. >>>>>>> + * @count: amount of data to pack into the scatter/gather list >>>>>>> + * >>>>>>> + * This is just like pack_sg_list() except that it takes page array >>>>>>> + * as an input and directly places them on the sg list after taking >>>>>>> + * care of the first page offset. >>>>>>> + */ >>>>>>> + >>>>>>> +static int >>>>>>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit, >>>>>>> + struct p9_fcall *pdu, int count) >>>>>>> +{ >>>>>>> + int s; >>>>>>> + int i = 0; >>>>>>> + int index = start; >>>>>>> + >>>>>>> + if (pdu->pdata_off) { >>>>>>> + s = min((int)(PAGE_SIZE - pdu->pdata_off), count); >>>>>>> + sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off); >>>>>>> + count -= s; >>>>>>> + } >>>>>>> + >>>>>>> + while (count) { >>>>>>> + BUG_ON(index > limit); >>>>>>> + s = min((int)PAGE_SIZE, count); >>>>>>> + sg_set_page(&sg[index++], pdu->pdata[i++], s, 0); >>>>>>> + count -= s; >>>>>>> + } >>>>>>> + >>>>>>> + return index-start; >>>>>>> +} >>>>>>> + >>>>>>> + >>>>>>> /* We don't currently allow canceling of virtio requests */ >>>>>>> static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req) >>>>>>> { >>>>>>> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req) >>>>>>> static int >>>>>>> p9_virtio_request(struct p9_client *client, struct p9_req_t *req) >>>>>>> { >>>>>>> - int in, out; >>>>>>> + int in, out, outp, inp; >>>>>>> struct virtio_chan *chan = client->trans; >>>>>>> char *rdata = (char *)req->rc+sizeof(struct p9_fcall); >>>>>>> >>>>>>> P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n"); >>>>>>> >>>>>>> out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata, >>>>>>> - req->tc->size); >>>>>>> - in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata, >>>>>>> + req->tc->size); >>>>>>> + >>>>>>> + BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len); >>>>>>> + >>>>>>> + if (req->tc->pdata_write_len) { >>>>>>> + outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM, >>>>>>> + req->tc, req->tc->pdata_write_len); >>>>>>> + out += outp; >>>>>>> + } >>>>>>> + if (req->tc->pdata_read_len) { >>>>>>> + inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11); >>>>>>> + in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM, >>>>>>> + req->tc, req->tc->pdata_read_len); >>>>>>> + in += inp; >>>>>>> + } else { >>>>>>> + in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, >>>>>>> client->msize); >>>>>>> + } >>>>>>> >>>>>>> req->status = REQ_STATUS_SENT; >>>>>>> >>>>>>> -- >>>>>>> 1.6.5.2 >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> This SF.net email is sponsored by >>>>>>> >>>>>>> Make an app they can't live without >>>>>>> Enter the BlackBerry Developer Challenge >>>>>>> http://p.sf.net/sfu/RIM-dev2dev >>>>>>> _______________________________________________ >>>>>>> V9fs-developer mailing list >>>>>>> V9fs-developer@xxxxxxxxxxxxxxxxxxxxx >>>>>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer >>>>>>> >>>>> >>> >>> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by >> >> Make an app they can't live without >> Enter the BlackBerry Developer Challenge >> http://p.sf.net/sfu/RIM-dev2dev >> _______________________________________________ >> V9fs-developer mailing list >> V9fs-developer@xxxxxxxxxxxxxxxxxxxxx >> https://lists.sourceforge.net/lists/listinfo/v9fs-developer >> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html