Re: [RFC 1/5] vhost: reorder used descriptors in a batch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> From: "jasowang" <jasowang@xxxxxxxxxx>
> To: "Guo Zhi" <qtxuning1999@xxxxxxxxxxx>
> Cc: "eperezma" <eperezma@xxxxxxxxxx>, "sgarzare" <sgarzare@xxxxxxxxxx>, "Michael Tsirkin" <mst@xxxxxxxxxx>, "netdev"
> <netdev@xxxxxxxxxxxxxxx>, "linux-kernel" <linux-kernel@xxxxxxxxxxxxxxx>, "kvm list" <kvm@xxxxxxxxxxxxxxx>,
> "virtualization" <virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx>
> Sent: Thursday, August 4, 2022 1:04:16 PM
> Subject: Re: [RFC 1/5] vhost: reorder used descriptors in a batch

> On Tue, Aug 2, 2022 at 10:12 PM Guo Zhi <qtxuning1999@xxxxxxxxxxx> wrote:
>>
>>
>>
>> ----- Original Message -----
>> > From: "jasowang" <jasowang@xxxxxxxxxx>
>> > To: "Guo Zhi" <qtxuning1999@xxxxxxxxxxx>
>> > Cc: "eperezma" <eperezma@xxxxxxxxxx>, "sgarzare" <sgarzare@xxxxxxxxxx>, "Michael
>> > Tsirkin" <mst@xxxxxxxxxx>, "netdev"
>> > <netdev@xxxxxxxxxxxxxxx>, "linux-kernel" <linux-kernel@xxxxxxxxxxxxxxx>, "kvm
>> > list" <kvm@xxxxxxxxxxxxxxx>,
>> > "virtualization" <virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx>
>> > Sent: Friday, July 29, 2022 3:32:02 PM
>> > Subject: Re: [RFC 1/5] vhost: reorder used descriptors in a batch
>>
>> > On Thu, Jul 28, 2022 at 4:26 PM Guo Zhi <qtxuning1999@xxxxxxxxxxx> wrote:
>> >>
>> >> On 2022/7/26 15:36, Jason Wang wrote:
>> >>
>> >>
>> >> 在 2022/7/21 16:43, Guo Zhi 写道:
>> >>
>> >> Device may not use descriptors in order, for example, NIC and SCSI may
>> >> not call __vhost_add_used_n with buffers in order.  It's the task of
>> >> __vhost_add_used_n to order them.
>> >>
>> >>
>> >>
>> >> I'm not sure this is ture. Having ooo descriptors is probably by design to have
>> >> better performance.
>> >>
>> >> This might be obvious for device that may have elevator or QOS stuffs.
>> >>
>> >> I suspect the right thing to do here is, for the device that can't perform
>> >> better in the case of IN_ORDER, let's simply not offer IN_ORDER (zerocopy or
>> >> scsi). And for the device we know it can perform better, non-zercopy ethernet
>> >> device we can do that.
>> >>
>> >>
>> >>   This commit reorder the buffers using
>> >> vq->heads, only the batch is begin from the expected start point and is
>> >> continuous can the batch be exposed to driver.  And only writing out a
>> >> single used ring for a batch of descriptors, according to VIRTIO 1.1
>> >> spec.
>> >>
>> >>
>> >>
>> >> So this sounds more like a "workaround" of the device that can't consume buffer
>> >> in order, I suspect it can help in performance.
>> >>
>> >> More below.
>> >>
>> >>
>> >>
>> >> Signed-off-by: Guo Zhi <qtxuning1999@xxxxxxxxxxx>
>> >> ---
>> >>   drivers/vhost/vhost.c | 44 +++++++++++++++++++++++++++++++++++++++++--
>> >>   drivers/vhost/vhost.h |  3 +++
>> >>   2 files changed, 45 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> >> index 40097826c..e2e77e29f 100644
>> >> --- a/drivers/vhost/vhost.c
>> >> +++ b/drivers/vhost/vhost.c
>> >> @@ -317,6 +317,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>> >>       vq->used_flags = 0;
>> >>       vq->log_used = false;
>> >>       vq->log_addr = -1ull;
>> >> +    vq->next_used_head_idx = 0;
>> >>       vq->private_data = NULL;
>> >>       vq->acked_features = 0;
>> >>       vq->acked_backend_features = 0;
>> >> @@ -398,6 +399,8 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
>> >>                         GFP_KERNEL);
>> >>           if (!vq->indirect || !vq->log || !vq->heads)
>> >>               goto err_nomem;
>> >> +
>> >> +        memset(vq->heads, 0, sizeof(*vq->heads) * dev->iov_limit);
>> >>       }
>> >>       return 0;
>> >>   @@ -2374,12 +2377,49 @@ static int __vhost_add_used_n(struct vhost_virtqueue
>> >>   *vq,
>> >>                   unsigned count)
>> >>   {
>> >>       vring_used_elem_t __user *used;
>> >> +    struct vring_desc desc;
>> >>       u16 old, new;
>> >>       int start;
>> >> +    int begin, end, i;
>> >> +    int copy_n = count;
>> >> +
>> >> +    if (vhost_has_feature(vq, VIRTIO_F_IN_ORDER)) {
>> >>
>> >>
>> >>
>> >> How do you guarantee that ids of heads are contiguous?
>> >>
>> >> There is no need to be contiguous for ids of heads.
>> >>
>> >> For example, I have three buffer { .id = 0, 15}, {.id = 20, 30} {.id = 15, 20}
>> >> for vhost_add_used_n. Then I will let the vq->heads[0].len=15.
>> >> vq->heads[15].len=5, vq->heads[20].len=10 as reorder. Once I found there is no
>> >> hold in the batched descriptors. I will expose them to driver.
>> >
>> > So spec said:
>> >
>> > "If VIRTIO_F_IN_ORDER has been negotiated, driver uses descriptors in
>> > ring order: starting from offset 0 in the table, and wrapping around
>> > at the end of the table."
>> >
>> > And
>> >
>> > "VIRTIO_F_IN_ORDER(35)This feature indicates that all buffers are used
>> > by the device in the same order in which they have been made
>> > available."
>> >
>> > This means your example is not an IN_ORDER device.
>> >
>> > The driver should submit buffers (assuming each buffer have one
>> > descriptor) in order {id = 0, 15}, {id = 1, 30} and {id = 2, 20}.
>> >
>> > And even if it is submitted in order, we can not use a batch because:
>> >
>> > "The skipped buffers (for which no used ring entry was written) are
>> > assumed to have been used (read or written) by the device completely."
>> >
>> > This means for TX we are probably ok, but for rx, unless we know the
>> > buffers were written completely, we can't write them in a batch.
>> >
>> > I'd suggest to do cross testing for this series:
>> >
>> > 1) testing vhost IN_ORDER support with DPDK virtio PMD
>> > 2) testing virtio IN_ORDER with DPDK vhost-user via testpmd
>> >
>> > Thanks
>> >
>> You are correct, for rx we can't do a batch because we have to let the driver
>> know the length of buffers.
> 
> Note that we can do a batch for rx when we know all the buffers have
> been fully written.
> 
>>
>> I think these circumstances can offer batch:
>> 1. tx
>> 2. rx with RX_MRGBUF feature, which introduce a header for each received buffer
>>
>> Consider batch is not a mandatory requirement for in order feature according to
>> spec.
>> I'd like to let current RFC patch focus on in order implementation, and send
>> another
>> patch series to improve performance by batching on above circumstances.
> 
> That's fine, how about simply starting from the patch that offers
> IN_ORDER when zerocopy is disabled?
> 

Yeah, I'd like to start from vsock device, which doesn't use zerocopy

Thanks
> Thanks
> 
>>
>> What's your opinon.
>>
>> Thanks
>> >
>> >>
>> >>
>> >> +        /* calculate descriptor chain length for each used buffer */
>> >>
>> >>
>> >>
>> >> I'm a little bit confused about this comment, we have heads[i].len for this?
>> >>
>> >> Maybe I should not use vq->heads, some misleading.
>> >>
>> >>
>> >> +        for (i = 0; i < count; i++) {
>> >> +            begin = heads[i].id;
>> >> +            end = begin;
>> >> +            vq->heads[begin].len = 0;
>> >>
>> >>
>> >>
>> >> Does this work for e.g RX virtqueue?
>> >>
>> >>
>> >> +            do {
>> >> +                vq->heads[begin].len += 1;
>> >> +                if (unlikely(vhost_get_desc(vq, &desc, end))) {
>> >>
>> >>
>> >>
>> >> Let's try hard to avoid more userspace copy here, it's the source of performance
>> >> regression.
>> >>
>> >> Thanks
>> >>
>> >>
>> >> +                    vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
>> >> +                           end, vq->desc + end);
>> >> +                    return -EFAULT;
>> >> +                }
>> >> +            } while ((end = next_desc(vq, &desc)) != -1);
>> >> +        }
>> >> +
>> >> +        count = 0;
>> >> +        /* sort and batch continuous used ring entry */
>> >> +        while (vq->heads[vq->next_used_head_idx].len != 0) {
>> >> +            count++;
>> >> +            i = vq->next_used_head_idx;
>> >> +            vq->next_used_head_idx = (vq->next_used_head_idx +
>> >> +                          vq->heads[vq->next_used_head_idx].len)
>> >> +                          % vq->num;
>> >> +            vq->heads[i].len = 0;
>> >> +        }
>> >> +        /* only write out a single used ring entry with the id corresponding
>> >> +         * to the head entry of the descriptor chain describing the last buffer
>> >> +         * in the batch.
>> >> +         */
>> >> +        heads[0].id = i;
>> >> +        copy_n = 1;
>> >> +    }
>> >>         start = vq->last_used_idx & (vq->num - 1);
>> >>       used = vq->used->ring + start;
>> >> -    if (vhost_put_used(vq, heads, start, count)) {
>> >> +    if (vhost_put_used(vq, heads, start, copy_n)) {
>> >>           vq_err(vq, "Failed to write used");
>> >>           return -EFAULT;
>> >>       }
>> >> @@ -2410,7 +2450,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct
>> >> vring_used_elem *heads,
>> >>         start = vq->last_used_idx & (vq->num - 1);
>> >>       n = vq->num - start;
>> >> -    if (n < count) {
>> >> +    if (n < count && !vhost_has_feature(vq, VIRTIO_F_IN_ORDER)) {
>> >>           r = __vhost_add_used_n(vq, heads, n);
>> >>           if (r < 0)
>> >>               return r;
>> >> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
>> >> index d9109107a..7b2c0fbb5 100644
>> >> --- a/drivers/vhost/vhost.h
>> >> +++ b/drivers/vhost/vhost.h
>> >> @@ -107,6 +107,9 @@ struct vhost_virtqueue {
>> >>       bool log_used;
>> >>       u64 log_addr;
>> >>   +    /* Sort heads in order */
>> >> +    u16 next_used_head_idx;
>> >> +
>> >>       struct iovec iov[UIO_MAXIOV];
>> >>       struct iovec iotlb_iov[64];
>> >>       struct iovec *indirect;
>> >>
>> >>
>> >>
>>



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux