Re: [V9fs-developer] [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 19, 2010 at 6:35 PM, Venkateswararao Jujjuri (JV)
<jvrao@xxxxxxxxxxxxxxxxxx> wrote:
> Eric Van Hensbergen wrote:
>> Yeah, I haven't done a thorough critique of the code yet as I'm
>> overloaded with other things, but my overall impression is similar to
>> lucho's.  Last time I looked at doing this my approach was to keep
>> everything zero copy in the fs and client layer and if I needed to
>> allocate/copy in the transport because it didn't support
>> scatter/gather I did it in the transport.
>>
>> Of course, this touches more than just the virtio transport and so I
>> can understand the trepidation at broader changes, but it seems like
>> we are doing backflips to try and avoid broader changes and the result
>> is somewhat awkward due the overloaded nature of many of the fields --
>> in the end it is likely to lead to more problems.
>
> I understand that it is not very pretty.. :) but the main motto was modular and
> limited change.
> Unless we separate the header from payload completely at the protocol level,
> I think it may be difficult to achieve zero copy cleanly.
>

I somewhat disagree - the point is to separate the header from payload
from the API perspective -- it has little to do with the protocol.
While keeping things modular and limited change is a good goal, there
are certain things that are worth rethinking when we start jumping
through too many hoops.  I think zero-copy is something we need to try
and strive for in every transport where its possible, and perhaps it
should be possible in all of them -- so its worth doing right.  This
is why I was holding off on optimizations until after we got the
basics of virtfs and .L in place, because I knew they would involve
broader changes that would impact more of the code base and so would
take more time to crack.

>
> I will be more than happy to take your suggestions on how to get it right cleanly.
> Increasing msize in the current model really doesn't solve the issue, as it forces
> us to bigger kmalloc()s even for simple transactions like TLCREATE/RLCREATE.
>

In situations where we are using zero copy it may be possible to make
the per-packet allocated buffer IOHDRSIZE.  Its something thats worth
looking at carefully - IMHO it doesn't make sense to allocate msize
buffers if we are passing payloads by reference most of the time.

>
> Another area where we need to look at is, the same structure, 9p_fcall is being
> used for
> both transmission and reception. Most of the times only one way will need larger
> memory,
> not the other way.
>

This is almost definitely true, but it depends on what we are treating
as payload.  Essentially, we only need something other than IOHDRSIZE
for a subset of operations (read, write, stat, and some of the new
ones in .L) -- if those operations also use sg-lists then we should be
able to do a better job.

        -eric


>
>>
>> On Thu, Aug 19, 2010 at 4:07 PM, Latchesar Ionkov <lucho@xxxxxxxxxx> wrote:
>>> Frankly, I don't like the way the zero copy is implemented much.
>>>
>>> 1. In case of Twrite, the size[4] field of the 9P message contains the
>>> size of the message without the data field.
>>>
>>> 2. In case of Tread message, the pages to receive the data are
>>> contained in req->tc instead of req->rc.
>
> We had some debate internally, on this before choosing req->tc. :)
> The argument being, whatever client sends, it puts it on TC, and server responds
> back on RC.
> But I do understand your point too.
>
> Eric/Lucho,
>
> Please let me know if you have any suggestions to get this right.
>
> Thanks,
> JV
>
>>>
>>> What is the point of having the page data in p9_fcall, if only the one
>>> in req->tc is ever going to be used? I am trying to implement a
>>> transport that uses your patches, I have to fill my code with
>>> explanations why I am doing crazy things.
>>>
>>> Thanks,
>>>    Lucho
>>>
>>> On Thu, Aug 19, 2010 at 2:47 PM, Venkateswararao Jujjuri (JV)
>>> <jvrao@xxxxxxxxxxxxxxxxxx> wrote:
>>>> Latchesar Ionkov wrote:
>>>>> It is kind of strange that part of the fcall packet (everything other
>>>>> than the Rread/Twrite data) is located in a buffer pointed by sdata,
>>>>> and the rest is in pages pointed by pdata. A scatterlist would make it
>>>>> tidier and easier to figure out what is where.
>>>> A separate sg list will further increase the size of PDU. Initially I had a sg list
>>>> sitting in the fcall structure. But when we are using the page address directly,
>>>> we are not making use of sdata completely ...only initial portion is used for
>>>> the header.
>>>> To make use of kernel memory efficiently, we came up with this plan of overloading
>>>> sdata with page pointers during the user initiated Rread/Twrite calls.
>>>>
>>>> The major advantage we saw with this method is, changes are very modular.
>>>> Other parts of the code, and other transports work without a change.
>>>>
>>>> If needed, we can easily have a separate sg list vector in the fcall, but it may
>>>> not be using
>>>> the kernel memory efficiently as we have the whole sdata allocated but not being
>>>> used.
>>>>
>>>> Thanks,
>>>> JV
>>>>
>>>>> Thanks,
>>>>>     Lucho
>>>>>
>>>>> On Thu, Aug 19, 2010 at 12:28 PM, Venkateswararao Jujjuri (JV)
>>>>> <jvrao@xxxxxxxxxxxxxxxxxx> wrote:
>>>>>> Latchesar Ionkov wrote:
>>>>>>> Is there any particular reason p9_fcall to have pointers to pages
>>>>>>> instead of a scatterlist?
>>>>>> Given that page sizes are constant, all we need is offset into the first page.
>>>>>> IO size determines the last page. So we decided no need to put sg list in the
>>>>>> p9_fcall.
>>>>>>
>>>>>> Thanks,
>>>>>> JV
>>>>>>
>>>>>>> Thanks,
>>>>>>>     Lucho
>>>>>>>
>>>>>>> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV)
>>>>>>> <jvrao@xxxxxxxxxxxxxxxxxx> wrote:
>>>>>>>> This patch adds necessary infrastructure for placing page addresses
>>>>>>>> directly on the sg list for the server to consume.
>>>>>>>>
>>>>>>>> The newly added routine pack_sg_list_p() is just like pack_sg_list()
>>>>>>>> except that it takes page array as an input and directly places them on
>>>>>>>> the sg list after taking care of the first page offset.
>>>>>>>>
>>>>>>>> Signed-off-by: Venkateswararao Jujjuri <jvrao@xxxxxxxxxxxxxxxxxx>
>>>>>>>> Signed-off-by: Badari Pulavarty <pbadari@xxxxxxxxxx>
>>>>>>>> ---
>>>>>>>>  include/net/9p/9p.h   |    6 ++++-
>>>>>>>>  net/9p/client.c       |    4 +++
>>>>>>>>  net/9p/trans_virtio.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++--
>>>>>>>>  3 files changed, 65 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
>>>>>>>> index a8de812..382ef22 100644
>>>>>>>> --- a/include/net/9p/9p.h
>>>>>>>> +++ b/include/net/9p/9p.h
>>>>>>>> @@ -651,7 +651,11 @@ struct p9_fcall {
>>>>>>>>
>>>>>>>>        size_t offset;
>>>>>>>>        size_t capacity;
>>>>>>>> -
>>>>>>>> +       struct page **pdata;
>>>>>>>> +       uint32_t pdata_mapped_pages;
>>>>>>>> +       uint32_t pdata_off;
>>>>>>>> +       uint32_t pdata_write_len;
>>>>>>>> +       uint32_t pdata_read_len;
>>>>>>>>        uint8_t *sdata;
>>>>>>>>  };
>>>>>>>>
>>>>>>>> diff --git a/net/9p/client.c b/net/9p/client.c
>>>>>>>> index 29bbbbd..5487896 100644
>>>>>>>> --- a/net/9p/client.c
>>>>>>>> +++ b/net/9p/client.c
>>>>>>>> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag)
>>>>>>>>                }
>>>>>>>>                req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall);
>>>>>>>>                req->tc->capacity = c->msize;
>>>>>>>> +               req->tc->pdata_write_len = 0;
>>>>>>>> +               req->tc->pdata_read_len = 0;
>>>>>>>>                req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall);
>>>>>>>>                req->rc->capacity = c->msize;
>>>>>>>> +               req->rc->pdata_write_len = 0;
>>>>>>>> +               req->rc->pdata_read_len = 0;
>>>>>>>>        }
>>>>>>>>
>>>>>>>>        p9pdu_reset(req->tc);
>>>>>>>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>>>>>>>> index 762c19f..8f86cb5 100644
>>>>>>>> --- a/net/9p/trans_virtio.c
>>>>>>>> +++ b/net/9p/trans_virtio.c
>>>>>>>> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data,
>>>>>>>>        return index-start;
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +/**
>>>>>>>> + * pack_sg_list_p - Pack a scatter gather list from an array of pages.
>>>>>>>> + * @sg: scatter/gather list to pack into
>>>>>>>> + * @start: which segment of the sg_list to start at
>>>>>>>> + * @limit: maximum segment to pack data to
>>>>>>>> + * @pdu: pdu prepared to put on the wire.
>>>>>>>> + * @count: amount of data to pack into the scatter/gather list
>>>>>>>> + *
>>>>>>>> + * This is just like pack_sg_list() except that it takes page array
>>>>>>>> + * as an input and directly places them on the sg list after taking
>>>>>>>> + * care of the first page offset.
>>>>>>>> + */
>>>>>>>> +
>>>>>>>> +static int
>>>>>>>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit,
>>>>>>>> +               struct p9_fcall *pdu, int count)
>>>>>>>> +{
>>>>>>>> +       int s;
>>>>>>>> +       int i = 0;
>>>>>>>> +       int index = start;
>>>>>>>> +
>>>>>>>> +       if (pdu->pdata_off) {
>>>>>>>> +               s = min((int)(PAGE_SIZE - pdu->pdata_off), count);
>>>>>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off);
>>>>>>>> +               count -= s;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       while (count) {
>>>>>>>> +               BUG_ON(index > limit);
>>>>>>>> +               s = min((int)PAGE_SIZE, count);
>>>>>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, 0);
>>>>>>>> +               count -= s;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       return index-start;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +
>>>>>>>>  /* We don't currently allow canceling of virtio requests */
>>>>>>>>  static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>>>>>  {
>>>>>>>> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>>>>>  static int
>>>>>>>>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>>>>>>>>  {
>>>>>>>> -       int in, out;
>>>>>>>> +       int in, out, outp, inp;
>>>>>>>>        struct virtio_chan *chan = client->trans;
>>>>>>>>        char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>>>>>>>>
>>>>>>>>        P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
>>>>>>>>
>>>>>>>>        out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>>>>>>>> -                                                               req->tc->size);
>>>>>>>> -       in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
>>>>>>>> +                       req->tc->size);
>>>>>>>> +
>>>>>>>> +       BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len);
>>>>>>>> +
>>>>>>>> +       if (req->tc->pdata_write_len) {
>>>>>>>> +               outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
>>>>>>>> +                                       req->tc, req->tc->pdata_write_len);
>>>>>>>> +               out += outp;
>>>>>>>> +       }
>>>>>>>> +       if (req->tc->pdata_read_len) {
>>>>>>>> +               inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
>>>>>>>> +               in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
>>>>>>>> +                                       req->tc, req->tc->pdata_read_len);
>>>>>>>> +               in += inp;
>>>>>>>> +       } else {
>>>>>>>> +               in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
>>>>>>>>                                                                client->msize);
>>>>>>>> +       }
>>>>>>>>
>>>>>>>>        req->status = REQ_STATUS_SENT;
>>>>>>>>
>>>>>>>> --
>>>>>>>> 1.6.5.2
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> This SF.net email is sponsored by
>>>>>>>>
>>>>>>>> Make an app they can't live without
>>>>>>>> Enter the BlackBerry Developer Challenge
>>>>>>>> http://p.sf.net/sfu/RIM-dev2dev
>>>>>>>> _______________________________________________
>>>>>>>> V9fs-developer mailing list
>>>>>>>> V9fs-developer@xxxxxxxxxxxxxxxxxxxxx
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>>>>>>>
>>>>>>
>>>>
>>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by
>>>
>>> Make an app they can't live without
>>> Enter the BlackBerry Developer Challenge
>>> http://p.sf.net/sfu/RIM-dev2dev
>>> _______________________________________________
>>> V9fs-developer mailing list
>>> V9fs-developer@xxxxxxxxxxxxxxxxxxxxx
>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux