Re: kvm device assignment and MSI-X masking

Alex Williamson <alex.williamson@xxxxxxxxxx> · Tue, 14 Aug 2012 09:34:26 -0600

On Tue, 2012-08-14 at 17:15 +0200, Jan Kiszka wrote:
> On 2012-08-14 16:31, Alex Williamson wrote:
> > On Tue, 2012-08-14 at 16:10 +0200, Jan Kiszka wrote:
> >> On 2012-08-14 16:05, Alex Williamson wrote:
> >>> On Tue, 2012-08-14 at 15:48 +0200, Jan Kiszka wrote:
> >>>> Hi Alex,
> >>>>
> >>>> you once wrote this comment in device-assignment.c, msix_mmio_write:
> >>>>
> >>>>     if (!msix_masked(&orig) && msix_masked(entry)) {
> >>>>         /*
> >>>>          * Vector masked, disable it
> >>>>          *
> >>>>          * XXX It's not clear if we can or should actually attempt
> >>>>          * to mask or disable the interrupt.  KVM doesn't have
> >>>>          * support for pending bits and kvm_assign_set_msix_entry
> >>>>          * doesn't modify the device hardware mask.  Interrupts
> >>>>          * while masked are simply not injected to the guest, so
> >>>>          * are lost.  Can we get away with always injecting an
> >>>>          * interrupt on unmask?
> >>>>          */
> >>>>
> >>>> I'm wondering what made you think that we won't inject if the vector is
> >>>> masked like this (ie. in the shadow MSI-X table). Can you recall the
> >>>> details?
> >>>>
> >>>> I'm trying to refactor this code to make the KVM interface a bit more
> >>>> encapsulating the kernel interface details, not fixing anything. Still,
> >>>> I would also like to avoid introducing regressions.
> >>>
> >>> Yeah, I didn't leave a very good comment there.  I'm sure it made more
> >>> sense to me at the time.  I think I was trying to say that not only do
> >>> we not have a way to mask the physical hardware, but if we did, we don't
> >>> have a way to retrieve the pending bits, so any pending interrupts while
> >>> masked would be lost.  We might be able to deal with that by posting a
> >>> spurious interrupt on unmask, but for now we do nothing as masking is
> >>> usually done just to update the vector.  Thanks,
> >>
> >> Ok, thanks for the clarification.
> >>
> >> As we are at it, do you also recall if this
> >>
> >> --- a/hw/device-assignment.c
> >> +++ b/hw/device-assignment.c
> >> @@ -1573,28 +1573,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> >>               */
> >>          } else if (msix_masked(&orig) && !msix_masked(entry)) {
> >>              /* Vector unmasked */
> >> -            if (i >= adev->irq_entries_nr || !adev->entry[i].type) {
> >> -                /* Previously unassigned vector, start from scratch */
> >> -                assigned_dev_update_msix(pdev);
> >> -                return;
> >> -            } else {
> >> -                /* Update an existing, previously masked vector */
> >> -                struct kvm_irq_routing_entry orig = adev->entry[i];
> >> -                int ret;
> >> -
> >> -                adev->entry[i].u.msi.address_lo = entry->addr_lo;
> >> -                adev->entry[i].u.msi.address_hi = entry->addr_hi;
> >> -                adev->entry[i].u.msi.data = entry->data;
> >> -
> >> -                ret = kvm_update_routing_entry(&orig, &adev->entry[i]);
> >> -                if (ret) {
> >> -                    fprintf(stderr,
> >> -                            "Error updating irq routing entry (%d)\n", ret);
> >> -                    return;
> >> -                }
> >> -
> >> -                kvm_irqchip_commit_routes(kvm_state);
> >> -            }
> >> +            assigned_dev_update_msix(pdev);
> >>          }
> >>      }
> >>  }
> >>
> >> would make a relevant difference for known workloads? I'm trying to get
> >> rid of direct routing table manipulations, but I would also like to
> >> avoid introducing things like kvm_irqchip_update_msi_route unless really
> >> necessary. Or could VFIO make use of that as well?
> > 
> > It makes me a little nervous, but I don't know that it won't work.
> > There's a lot more latency in turning off MSI-X and completely
> > rebuilding it than there is in updating the routing of a single vector.
> > You can imagine that irqbalance could be triggering this path pretty
> > regularly.  Increasing vectors beyond what was previously setup is more
> > of an init-time event, so the latency doesn't bother me as much.  We'd
> > probably have to send some spurious interrupts for anything we might
> > have missed if we take the high latency path.
> 
> Yeah, good points.
> 
> > 
> > VFIO is already a little more abstracted, making use of the msix vector
> > use and release interface, but we do still make use of the kvm_irqchip
> > irqfd/virq interfaces.
> 
> Hmm, but due to the nature of the callbacks, we always disable/reanable
> on mask/unmask. So VFIO will be slower than current device assignment in
> this regard.

It's a bit awkward, I'm not thrilled with those msix callbacks but they
seem to work.  I have a similar comment in static void
vfio_msix_vector_release that maybe we should just disable direct
injection on mask so that qemu-msix can do the masking and fill in the
PBA.

> BTW, how do you handle the device's PBA? Pass it through to the guest?

We could but I'm trying to use qemu-msix infrastructure which handles
the PBA.  We've been working happily w/o good PBA support for so long, I
haven't bothered to work on a channel to get to the physical PBA yet.
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html