Yeah, I haven't done a thorough critique of the code yet as I'm overloaded with other things, but my overall impression is similar to lucho's. Last time I looked at doing this my approach was to keep everything zero copy in the fs and client layer and if I needed to allocate/copy in the transport because it didn't support scatter/gather I did it in the transport. Of course, this touches more than just the virtio transport and so I can understand the trepidation at broader changes, but it seems like we are doing backflips to try and avoid broader changes and the result is somewhat awkward due the overloaded nature of many of the fields -- in the end it is likely to lead to more problems. -eric On Thu, Aug 19, 2010 at 4:07 PM, Latchesar Ionkov <lucho@xxxxxxxxxx> wrote: > Frankly, I don't like the way the zero copy is implemented much. > > 1. In case of Twrite, the size[4] field of the 9P message contains the > size of the message without the data field. > > 2. In case of Tread message, the pages to receive the data are > contained in req->tc instead of req->rc. > > What is the point of having the page data in p9_fcall, if only the one > in req->tc is ever going to be used? I am trying to implement a > transport that uses your patches, I have to fill my code with > explanations why I am doing crazy things. > > Thanks, > Lucho > > On Thu, Aug 19, 2010 at 2:47 PM, Venkateswararao Jujjuri (JV) > <jvrao@xxxxxxxxxxxxxxxxxx> wrote: >> Latchesar Ionkov wrote: >>> It is kind of strange that part of the fcall packet (everything other >>> than the Rread/Twrite data) is located in a buffer pointed by sdata, >>> and the rest is in pages pointed by pdata. A scatterlist would make it >>> tidier and easier to figure out what is where. >> >> A separate sg list will further increase the size of PDU. Initially I had a sg list >> sitting in the fcall structure. But when we are using the page address directly, >> we are not making use of sdata completely ...only initial portion is used for >> the header. >> To make use of kernel memory efficiently, we came up with this plan of overloading >> sdata with page pointers during the user initiated Rread/Twrite calls. >> >> The major advantage we saw with this method is, changes are very modular. >> Other parts of the code, and other transports work without a change. >> >> If needed, we can easily have a separate sg list vector in the fcall, but it may >> not be using >> the kernel memory efficiently as we have the whole sdata allocated but not being >> used. >> >> Thanks, >> JV >> >>> >>> Thanks, >>> Lucho >>> >>> On Thu, Aug 19, 2010 at 12:28 PM, Venkateswararao Jujjuri (JV) >>> <jvrao@xxxxxxxxxxxxxxxxxx> wrote: >>>> Latchesar Ionkov wrote: >>>>> Is there any particular reason p9_fcall to have pointers to pages >>>>> instead of a scatterlist? >>>> Given that page sizes are constant, all we need is offset into the first page. >>>> IO size determines the last page. So we decided no need to put sg list in the >>>> p9_fcall. >>>> >>>> Thanks, >>>> JV >>>> >>>>> Thanks, >>>>> Lucho >>>>> >>>>> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV) >>>>> <jvrao@xxxxxxxxxxxxxxxxxx> wrote: >>>>>> This patch adds necessary infrastructure for placing page addresses >>>>>> directly on the sg list for the server to consume. >>>>>> >>>>>> The newly added routine pack_sg_list_p() is just like pack_sg_list() >>>>>> except that it takes page array as an input and directly places them on >>>>>> the sg list after taking care of the first page offset. >>>>>> >>>>>> Signed-off-by: Venkateswararao Jujjuri <jvrao@xxxxxxxxxxxxxxxxxx> >>>>>> Signed-off-by: Badari Pulavarty <pbadari@xxxxxxxxxx> >>>>>> --- >>>>>> include/net/9p/9p.h | 6 ++++- >>>>>> net/9p/client.c | 4 +++ >>>>>> net/9p/trans_virtio.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++-- >>>>>> 3 files changed, 65 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h >>>>>> index a8de812..382ef22 100644 >>>>>> --- a/include/net/9p/9p.h >>>>>> +++ b/include/net/9p/9p.h >>>>>> @@ -651,7 +651,11 @@ struct p9_fcall { >>>>>> >>>>>> size_t offset; >>>>>> size_t capacity; >>>>>> - >>>>>> + struct page **pdata; >>>>>> + uint32_t pdata_mapped_pages; >>>>>> + uint32_t pdata_off; >>>>>> + uint32_t pdata_write_len; >>>>>> + uint32_t pdata_read_len; >>>>>> uint8_t *sdata; >>>>>> }; >>>>>> >>>>>> diff --git a/net/9p/client.c b/net/9p/client.c >>>>>> index 29bbbbd..5487896 100644 >>>>>> --- a/net/9p/client.c >>>>>> +++ b/net/9p/client.c >>>>>> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag) >>>>>> } >>>>>> req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall); >>>>>> req->tc->capacity = c->msize; >>>>>> + req->tc->pdata_write_len = 0; >>>>>> + req->tc->pdata_read_len = 0; >>>>>> req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall); >>>>>> req->rc->capacity = c->msize; >>>>>> + req->rc->pdata_write_len = 0; >>>>>> + req->rc->pdata_read_len = 0; >>>>>> } >>>>>> >>>>>> p9pdu_reset(req->tc); >>>>>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c >>>>>> index 762c19f..8f86cb5 100644 >>>>>> --- a/net/9p/trans_virtio.c >>>>>> +++ b/net/9p/trans_virtio.c >>>>>> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data, >>>>>> return index-start; >>>>>> } >>>>>> >>>>>> +/** >>>>>> + * pack_sg_list_p - Pack a scatter gather list from an array of pages. >>>>>> + * @sg: scatter/gather list to pack into >>>>>> + * @start: which segment of the sg_list to start at >>>>>> + * @limit: maximum segment to pack data to >>>>>> + * @pdu: pdu prepared to put on the wire. >>>>>> + * @count: amount of data to pack into the scatter/gather list >>>>>> + * >>>>>> + * This is just like pack_sg_list() except that it takes page array >>>>>> + * as an input and directly places them on the sg list after taking >>>>>> + * care of the first page offset. >>>>>> + */ >>>>>> + >>>>>> +static int >>>>>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit, >>>>>> + struct p9_fcall *pdu, int count) >>>>>> +{ >>>>>> + int s; >>>>>> + int i = 0; >>>>>> + int index = start; >>>>>> + >>>>>> + if (pdu->pdata_off) { >>>>>> + s = min((int)(PAGE_SIZE - pdu->pdata_off), count); >>>>>> + sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off); >>>>>> + count -= s; >>>>>> + } >>>>>> + >>>>>> + while (count) { >>>>>> + BUG_ON(index > limit); >>>>>> + s = min((int)PAGE_SIZE, count); >>>>>> + sg_set_page(&sg[index++], pdu->pdata[i++], s, 0); >>>>>> + count -= s; >>>>>> + } >>>>>> + >>>>>> + return index-start; >>>>>> +} >>>>>> + >>>>>> + >>>>>> /* We don't currently allow canceling of virtio requests */ >>>>>> static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req) >>>>>> { >>>>>> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req) >>>>>> static int >>>>>> p9_virtio_request(struct p9_client *client, struct p9_req_t *req) >>>>>> { >>>>>> - int in, out; >>>>>> + int in, out, outp, inp; >>>>>> struct virtio_chan *chan = client->trans; >>>>>> char *rdata = (char *)req->rc+sizeof(struct p9_fcall); >>>>>> >>>>>> P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n"); >>>>>> >>>>>> out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata, >>>>>> - req->tc->size); >>>>>> - in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata, >>>>>> + req->tc->size); >>>>>> + >>>>>> + BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len); >>>>>> + >>>>>> + if (req->tc->pdata_write_len) { >>>>>> + outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM, >>>>>> + req->tc, req->tc->pdata_write_len); >>>>>> + out += outp; >>>>>> + } >>>>>> + if (req->tc->pdata_read_len) { >>>>>> + inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11); >>>>>> + in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM, >>>>>> + req->tc, req->tc->pdata_read_len); >>>>>> + in += inp; >>>>>> + } else { >>>>>> + in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, >>>>>> client->msize); >>>>>> + } >>>>>> >>>>>> req->status = REQ_STATUS_SENT; >>>>>> >>>>>> -- >>>>>> 1.6.5.2 >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> This SF.net email is sponsored by >>>>>> >>>>>> Make an app they can't live without >>>>>> Enter the BlackBerry Developer Challenge >>>>>> http://p.sf.net/sfu/RIM-dev2dev >>>>>> _______________________________________________ >>>>>> V9fs-developer mailing list >>>>>> V9fs-developer@xxxxxxxxxxxxxxxxxxxxx >>>>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer >>>>>> >>>> >>>> >> >> >> > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ > V9fs-developer mailing list > V9fs-developer@xxxxxxxxxxxxxxxxxxxxx > https://lists.sourceforge.net/lists/listinfo/v9fs-developer > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html