Re: RFC: VDPA Interrupt vector distribution

Jason Wang <jasowang@xxxxxxxxxx> · Tue, 31 Jan 2023 14:02:04 +0800

On Mon, Jan 30, 2023 at 7:54 PM Eli Cohen <elic@xxxxxxxxxx> wrote:
>
>
> On 30/01/2023 13:34, Michael S. Tsirkin wrote:
> > On Mon, Jan 30, 2023 at 12:01:23PM +0200, Eli Cohen wrote:
> >> On 30/01/2023 10:19, Jason Wang wrote:
> >>> Hi Eli:
> >>>
> >>> On Mon, Jan 23, 2023 at 1:59 PM Eli Cohen <elic@xxxxxxxxxx> wrote:
> >>>> VDPA allows hardware drivers the propagate interrupts from the hardware
> >>>> directly to the vCPU used by the guest. In a typical implementation, the
> >>>> hardware driver will assign the interrupt vectors to the virtqueues and report
> >>>> this information back through the get_vq_irq() callback defined in
> >>>> struct vdpa_config_ops.
> >>>>
> >>>> Interrupt vectors could be a scarce resource and may be limited. For such
> >>>> cases, we can opt the administrator, through the vdpa tool, to set the policy
> >>>> defining how to distribute the available vectors amongst the data virtqueues.
> >>>>
> >>>> The following policies are proposed:
> >>>>
> >>>> 1. First comes first served. Assign a vector to each data virtqueue by the
> >>>>       virtqueue index. Virtqueues which could not be assigned a dedicated vector
> >>>>       would use the hardware driver to propagate interrupts using the available
> >>>>       callback mechanism.
> >>>>
> >>>>       vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=all
> >>>>
> >>>>       This is the default mode and works even if "int=all" was not specified.
> >>>>
> >>>> 2. Use round robin distribution so virtqueues could share vectors.
> >>>>       vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=all intmode=share
> >>>>
> >>>> 3. Assign vectors to RX virtqueues only.
> >>>> 3.1 Do not share vectors
> >>>>        vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=rx
> >>>> 3.2 Share vectors
> >>>>        vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 int=rx intmode=share
> >>>>
> >>>> 4. Assign vectors to TX virtqueues only. Can share or not, like rx.
> >>>> 5. Fail device creation if number of vectors cannot be fulfilled.
> >>>>       vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 max_vq_pairs 8 int=rx intnum=8
> >>> I wonder:
> >>>
> >>> 1) how the administrator can know if there's sufficient resources for
> >>> one of the above policies.
> >> There's no established way to know. The idea is to use whatever there is
> >> assuming interrupt bypassing is always better then the callback mechanism.
> >>> 2) how does the administrator know which policy is the best assuming
> >>> the resources are sufficient? (E.g vectors to RX only or vectors to TX
> >>> only)
> >> I don't think there's a rule of thumb here but he needs to experiment what
> >> works best for him.
> >>> If it requires a vendor specific way or knowledge, I believe it's
> >>> better to code them in:
> >>>
> >>> 1) the vDPA parent or
> >>> 2) underlayer management tool or drivers
> >>>
> >>> Thanks
> >> I was wondering also about the current mechanism we have. The hardware
> >> driver reports irq number for each VQ.
> >>
> >> The guest driver sees a virtio pci device with MSIX vectors as the number of
> >> virtqueues.
> >>
> >> Suppose the hardware driver provided only 5 interrupt vectors while there
> >> are 16 VQs.
> >>
> >> Which MSIX vector at the guest gets really posted interrupt and which one
> >> uses callback handled at the hardware driver?
> > Not sure I understand.
> > If you get a single interrupt from hardware callback or posted
> > you can only drive one interrupt to guest, no?
> >
> For every VQ I have a chance to assign interrupt vector.
>
> Consider this scenario:
>
> mlx5_vdpa created with 16 data virtqueu
>
> mlx5_vdpa associates VQ0 with interrupt vector. The reset of the vectors
> don't get assigned vectors and use old callback mechanism.
>
> When you go to the VM and run lspci, you will see the device has 16 MSIX
> vectors.

Note that the guest MSI-X vectors are emulated by software, you can
change by specificing "vectors=X" parameters of virtio-pci. And those
MSI-X vectors are backed by eventfds which Qemu will create and pass
to both KVM and vhost-vDPA.

>
> Do you know which of the MSIX vectors on the guest is the vector I
> assigned for VQ0?

The mapping from guest MSI-X vector to VQ0 is done via
queue_msix_vector in the capability, and it is under the control of
guest virtio-pci drivers.

The mapping from host MSI-X to guest MSI-X (required for the posted
interrupt) is done via matching the eventfd between KVM and vhost-vDPA
when assigning eventfds. So assuming:

1) guest driver use guest seen MSI-X vector X for vq0
2) host driver report irqX via get_vq_irq(0)

Then corresponding host MSI-X of irqX is mapped to vq0 (via guest seen
MSI-X vector X) via posted interrupt when it is possible. If the
posted interrupt can't work for some reasons, the code will fallback
to vq_callback which is a simple eventfd_signal().

Thanks

>
> >>>>
> >>>>
>

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization