Re: [PATCH v6 10/10] vdpa: Always start CVQ in SVQ mode

Eugenio Perez Martin <eperezma@xxxxxxxxxx> · Fri, 11 Nov 2022 15:38:22 +0100

On Fri, Nov 11, 2022 at 9:03 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
>
>
> 在 2022/11/11 00:07, Eugenio Perez Martin 写道:
> > On Thu, Nov 10, 2022 at 7:25 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >>
> >> 在 2022/11/9 01:07, Eugenio Pérez 写道:
> >>> Isolate control virtqueue in its own group, allowing to intercept control
> >>> commands but letting dataplane run totally passthrough to the guest.
> >>
> >> I think we need to tweak the title to "vdpa: Always start CVQ in SVQ
> >> mode if possible". Since SVQ for CVQ can't be enabled without ASID support?
> >>
> > Yes, I totally agree.
>
>
> Btw, I wonder if it's worth to remove the "x-" prefix for the shadow
> virtqueue. It can help for the devices without ASID support but want do
> live migration.
>

Sure I can propose on top.

>
> >
> >>> Signed-off-by: Eugenio Pérez <eperezma@xxxxxxxxxx>
> >>> ---
> >>> v6:
> >>> * Disable control SVQ if the device does not support it because of
> >>> features.
> >>>
> >>> v5:
> >>> * Fixing the not adding cvq buffers when x-svq=on is specified.
> >>> * Move vring state in vhost_vdpa_get_vring_group instead of using a
> >>>     parameter.
> >>> * Rename VHOST_VDPA_NET_CVQ_PASSTHROUGH to VHOST_VDPA_NET_DATA_ASID
> >>>
> >>> v4:
> >>> * Squash vhost_vdpa_cvq_group_is_independent.
> >>> * Rebased on last CVQ start series, that allocated CVQ cmd bufs at load
> >>> * Do not check for cvq index on vhost_vdpa_net_prepare, we only have one
> >>>     that callback registered in that NetClientInfo.
> >>>
> >>> v3:
> >>> * Make asid related queries print a warning instead of returning an
> >>>     error and stop the start of qemu.
> >>> ---
> >>>    hw/virtio/vhost-vdpa.c |   3 +-
> >>>    net/vhost-vdpa.c       | 138 ++++++++++++++++++++++++++++++++++++++---
> >>>    2 files changed, 132 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>> index e3914fa40e..6401e7efb1 100644
> >>> --- a/hw/virtio/vhost-vdpa.c
> >>> +++ b/hw/virtio/vhost-vdpa.c
> >>> @@ -648,7 +648,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> >>>    {
> >>>        uint64_t features;
> >>>        uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
> >>> -        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
> >>> +        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
> >>> +        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
> >>>        int r;
> >>>
> >>>        if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
> >>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> >>> index 02780ee37b..7245ea70c6 100644
> >>> --- a/net/vhost-vdpa.c
> >>> +++ b/net/vhost-vdpa.c
> >>> @@ -38,6 +38,9 @@ typedef struct VhostVDPAState {
> >>>        void *cvq_cmd_out_buffer;
> >>>        virtio_net_ctrl_ack *status;
> >>>
> >>> +    /* Number of address spaces supported by the device */
> >>> +    unsigned address_space_num;
> >>
> >> I'm not sure this is the best place to store thing like this since it
> >> can cause confusion. We will have multiple VhostVDPAState when
> >> multiqueue is enabled.
> >>
> > I think we can delete this and ask it on each device start.
> >
> >>> +
> >>>        /* The device always have SVQ enabled */
> >>>        bool always_svq;
> >>>        bool started;
> >>> @@ -101,6 +104,9 @@ static const uint64_t vdpa_svq_device_features =
> >>>        BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
> >>>        BIT_ULL(VIRTIO_NET_F_STANDBY);
> >>>
> >>> +#define VHOST_VDPA_NET_DATA_ASID 0
> >>> +#define VHOST_VDPA_NET_CVQ_ASID 1
> >>> +
> >>>    VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
> >>>    {
> >>>        VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>> @@ -242,6 +248,34 @@ static NetClientInfo net_vhost_vdpa_info = {
> >>>            .check_peer_type = vhost_vdpa_check_peer_type,
> >>>    };
> >>>
> >>> +static uint32_t vhost_vdpa_get_vring_group(int device_fd, unsigned vq_index)
> >>> +{
> >>> +    struct vhost_vring_state state = {
> >>> +        .index = vq_index,
> >>> +    };
> >>> +    int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_GROUP, &state);
> >>> +
> >>> +    return r < 0 ? 0 : state.num;
> >>
> >> Assume 0 when ioctl() fail is probably not a good idea: errors in ioctl
> >> might be hidden. It would be better to fallback to 0 when ASID is not
> >> supported.
> >>
> > Did I misunderstand you on [1]?
>
>
> Nope. I think I was wrong at that time then :( Sorry for that.
>
> We should differ from the case
>
> 1) no ASID support so 0 is assumed
>
> 2) something wrong in the case of ioctl, it's not necessarily a ENOTSUPP.
>

What action should we take here? Isn't it better to disable SVQ and
let the device run?

>
> >
> >>> +}
> >>> +
> >>> +static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> >>> +                                           unsigned vq_group,
> >>> +                                           unsigned asid_num)
> >>> +{
> >>> +    struct vhost_vring_state asid = {
> >>> +        .index = vq_group,
> >>> +        .num = asid_num,
> >>> +    };
> >>> +    int ret;
> >>> +
> >>> +    ret = ioctl(v->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
> >>> +    if (unlikely(ret < 0)) {
> >>> +        warn_report("Can't set vq group %u asid %u, errno=%d (%s)",
> >>> +            asid.index, asid.num, errno, g_strerror(errno));
> >>> +    }
> >>> +    return ret;
> >>> +}
> >>> +
> >>>    static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
> >>>    {
> >>>        VhostIOVATree *tree = v->iova_tree;
> >>> @@ -316,11 +350,54 @@ dma_map_err:
> >>>    static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> >>>    {
> >>>        VhostVDPAState *s;
> >>> -    int r;
> >>> +    struct vhost_vdpa *v;
> >>> +    uint32_t cvq_group;
> >>> +    int cvq_index, r;
> >>>
> >>>        assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>>
> >>>        s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>> +    v = &s->vhost_vdpa;
> >>> +
> >>> +    v->listener_shadow_vq = s->always_svq;
> >>> +    v->shadow_vqs_enabled = s->always_svq;
> >>> +    s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_DATA_ASID;
> >>> +
> >>> +    if (s->always_svq) {
> >>> +        goto out;
> >>> +    }
> >>> +
> >>> +    if (s->address_space_num < 2) {
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    if (!vhost_vdpa_net_valid_svq_features(v->dev->features, NULL)) {
> >>> +        return 0;
> >>> +    }
> >>
> >> Any reason we do the above check during the start/stop? It should be
> >> easier to do that in the initialization.
> >>
> > We can store it as a member of VhostVDPAState maybe? They will be
> > duplicated like the current number of AS.
>
>
> I meant each VhostVDPAState just need to know the ASID it needs to use.
> There's no need to know the total number of address spaces or do the
> validation on it during start (the validation could be done during
> initialization).
>

I thought we were talking about the virtio features.

So let's omit this check and simply try to set ASID? The worst case is
an -ENOTSUPP or -EINVAL, so the actions to take are the same as if we
don't have enough AS.

>
> >
> >>> +
> >>> +    /**
> >>> +     * Check if all the virtqueues of the virtio device are in a different vq
> >>> +     * than the last vq. VQ group of last group passed in cvq_group.
> >>> +     */
> >>> +    cvq_index = v->dev->vq_index_end - 1;
> >>> +    cvq_group = vhost_vdpa_get_vring_group(v->device_fd, cvq_index);
> >>> +    for (int i = 0; i < cvq_index; ++i) {
> >>> +        uint32_t group = vhost_vdpa_get_vring_group(v->device_fd, i);
> >>> +
> >>> +        if (unlikely(group == cvq_group)) {
> >>> +            warn_report("CVQ %u group is the same as VQ %u one (%u)", cvq_group,
> >>> +                        i, group);
> >>> +            return 0;
> >>> +        }
> >>> +    }
> >>> +
> >>> +    r = vhost_vdpa_set_address_space_id(v, cvq_group, VHOST_VDPA_NET_CVQ_ASID);
> >>> +    if (r == 0) {
> >>> +        v->shadow_vqs_enabled = true;
> >>> +        s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
> >>> +    }
> >>> +
> >>> +out:
> >>>        if (!s->vhost_vdpa.shadow_vqs_enabled) {
> >>>            return 0;
> >>>        }
> >>> @@ -542,12 +619,38 @@ static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
> >>>        .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
> >>>    };
> >>>
> >>> +static uint32_t vhost_vdpa_get_as_num(int vdpa_device_fd)
> >>> +{
> >>> +    uint64_t features;
> >>> +    unsigned num_as;
> >>> +    int r;
> >>> +
> >>> +    r = ioctl(vdpa_device_fd, VHOST_GET_BACKEND_FEATURES, &features);
> >>> +    if (unlikely(r < 0)) {
> >>> +        warn_report("Cannot get backend features");
> >>> +        return 1;
> >>> +    }
> >>> +
> >>> +    if (!(features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID))) {
> >>> +        return 1;
> >>> +    }
> >>> +
> >>> +    r = ioctl(vdpa_device_fd, VHOST_VDPA_GET_AS_NUM, &num_as);
> >>> +    if (unlikely(r < 0)) {
> >>> +        warn_report("Cannot retrieve number of supported ASs");
> >>> +        return 1;
> >>
> >> Let's return error here. This help. to identify bugs of qemu or kernel.
> >>
> > Same comment as with VHOST_VDPA_GET_VRING_GROUP.
> >
> >>> +    }
> >>> +
> >>> +    return num_as;
> >>> +}
> >>> +
> >>>    static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>                                               const char *device,
> >>>                                               const char *name,
> >>>                                               int vdpa_device_fd,
> >>>                                               int queue_pair_index,
> >>>                                               int nvqs,
> >>> +                                           unsigned nas,
> >>>                                               bool is_datapath,
> >>>                                               bool svq,
> >>>                                               VhostIOVATree *iova_tree)
> >>> @@ -566,6 +669,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>        qemu_set_info_str(nc, TYPE_VHOST_VDPA);
> >>>        s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>>
> >>> +    s->address_space_num = nas;
> >>>        s->vhost_vdpa.device_fd = vdpa_device_fd;
> >>>        s->vhost_vdpa.index = queue_pair_index;
> >>>        s->always_svq = svq;
> >>> @@ -652,6 +756,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>        g_autoptr(VhostIOVATree) iova_tree = NULL;
> >>>        NetClientState *nc;
> >>>        int queue_pairs, r, i = 0, has_cvq = 0;
> >>> +    unsigned num_as = 1;
> >>> +    bool svq_cvq;
> >>>
> >>>        assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>>        opts = &netdev->u.vhost_vdpa;
> >>> @@ -693,12 +799,28 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>            return queue_pairs;
> >>>        }
> >>>
> >>> -    if (opts->x_svq) {
> >>> -        struct vhost_vdpa_iova_range iova_range;
> >>> +    svq_cvq = opts->x_svq;
> >>> +    if (has_cvq && !opts->x_svq) {
> >>> +        num_as = vhost_vdpa_get_as_num(vdpa_device_fd);
> >>> +        svq_cvq = num_as > 1;
> >>> +    }
> >>
> >> The above check is not easy to follow, how about?
> >>
> >> svq_cvq = vhost_vdpa_get_as_num() > 1 ? true : opts->x_svq;
> >>
> > That would allocate the iova tree even if CVQ is not used in the
> > guest. And num_as is reused later, although we can ask it to the
> > device at device start to avoid this.
>
>
> Ok.
>
>
> >
> > If any, the linear conversion would be:
> > svq_cvq = opts->x_svq || (has_cvq && vhost_vdpa_get_as_num(vdpa_device_fd))
> >
> > So we avoid the AS_NUM ioctl if not needed.
>
>
> So when !opts->x_svq, we need to check num_as is at least 2?
>

As I think you proposed, we can simply try to set CVQ ASID and react to -EINVAL.

But this code here is trying to not to allocate iova_tree if we're
sure it will not be needed. Maybe it is easier to always allocate the
empty iova tree and only fill it if needed?

>
> >
> >>> +
> >>> +    if (opts->x_svq || svq_cvq) {
> >>
> >> Any chance we can have opts->x_svq = true but svq_cvq = false? Checking
> >> svq_cvq seems sufficient here.
> >>
> > The reverse is possible, to have svq_cvq but no opts->x_svq.
> >
> > Depending on that, this code emits a warning or a fatal error.
>
>
> Ok, as replied in the previous patch, I think we need a better name for
> those ones.
>

cvq_svq can be renamed for sure. x_svq can be aliased with other
variable if needed too.

> if (opts->x_svq) {
>          shadow_data_vq = true;
>          if(has_cvq) shadow_cvq = true;
> } else if (num_as >= 2 && has_cvq) {
>          shadow_cvq = true;
> }
>
> The other logic can just check shadow_cvq or shadow_data_vq individually.
>

Not sure if shadow_data_vq is accurate. It sounds to me as "Only
shadow data virtqueues but not CVQ".

shadow_device and shadow_cvq?

>
> >
> >>> +        Error *warn = NULL;
> >>>
> >>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
> >>> -            goto err_svq;
> >>> +        svq_cvq = vhost_vdpa_net_valid_svq_features(features,
> >>> +                                                   opts->x_svq ? errp : &warn);
> >>> +        if (!svq_cvq) {
> >>
> >> Same question as above.
> >>
> >>
> >>> +            if (opts->x_svq) {
> >>> +                goto err_svq;
> >>> +            } else {
> >>> +                warn_reportf_err(warn, "Cannot shadow CVQ: ");
> >>> +            }
> >>>            }
> >>> +    }
> >>> +
> >>> +    if (opts->x_svq || svq_cvq) {
> >>> +        struct vhost_vdpa_iova_range iova_range;
> >>>
> >>>            vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> >>>            iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> >>> @@ -708,15 +830,15 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>
> >>>        for (i = 0; i < queue_pairs; i++) {
> >>>            ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >>> -                                     vdpa_device_fd, i, 2, true, opts->x_svq,
> >>> -                                     iova_tree);
> >>> +                                     vdpa_device_fd, i, 2, num_as, true,
> >>
> >> I don't get why we need pass num_as to a specific vhost_vdpa structure.
> >> It should be sufficient to pass asid there.
> >>
> > ASID is not known at this time, but at device's start. This is because
> > we cannot ask if the CVQ is in its own vq group, because we don't know
> > the control virtqueue index until the guest acknowledges the different
> > features.
>
>
> We can probe those during initialization I think. E.g doing some
> negotiation in the initialization phase.
>

We've developed this idea in other threads, let's continue there better.

Thanks!

> Thanks
>
>
> >
> > Thanks!
> >
> > [1] https://www.mail-archive.com/qemu-devel@xxxxxxxxxx/msg901685.html
> >
> >
> >>> +                                     opts->x_svq, iova_tree);
> >>>            if (!ncs[i])
> >>>                goto err;
> >>>        }
> >>>
> >>>        if (has_cvq) {
> >>>            nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >>> -                                 vdpa_device_fd, i, 1, false,
> >>> +                                 vdpa_device_fd, i, 1, num_as, false,
> >>>                                     opts->x_svq, iova_tree);
> >>>            if (!nc)
> >>>                goto err;
>