Re: KVM devices assignment; PCIe AER?

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Thu, 28 Oct 2010 07:39:59 +0200

On Wed, Oct 27, 2010 at 11:17:42PM -0600, Alex Williamson wrote:
> On Thu, 2010-10-28 at 06:58 +0200, Michael S. Tsirkin wrote:
> > On Wed, Oct 27, 2010 at 04:58:20PM -0600, Alex Williamson wrote:
> > > On Wed, 2010-10-27 at 14:43 -0700, Etienne Martineau wrote:
> > > > On Wed, 27 Oct 2010, Alex Williamson wrote:
> > > > > No, emulated devices trigger interrupts directly with qemu_set_irq.
> > > > > irqfds are currently only used by vhost afaik, since it's being
> > > > > interrupted externally, much like pass through devices are.
> > > > 
> > > > Fair enough. Thanks for the clarification.
> > > > 
> > > > > Sort of.  When the VFIO device triggers an interrupt, we get notified
> > > > > via the eventfd we've registered for that interrupt.  We can then call
> > > > > qemu_set_irq directly to raise that interrupt in the KVM kernel APIC.
> > > > > That much works today.
> > > > 
> > > > Understood but performance wise this is no good for KVM right?
> > > 
> > > Right, bouncing interrupts and EOIs through qemu via eventfds is going
> > > to add latency.  On the interrupt path we already have irqfds, which
> > > will avoid the bounce through userspace, we just need to use them.
> > > Doing something similar with EOIs could avoid that path, giving us
> > > something comparable to current device assignment.
> > > 
> > > > > The irqfd mechanism is simply a way for KVM to
> > > > > directly consume the eventfd and raise an interrupt via a pre-setup
> > > > > vector.  That's yet to be implemented for INTx on VFIO, but should
> > > > > mostly be a matter of connecting existing pieces together.  It's working
> > > > > for MSI-X.
> > > > 
> > > > OK, I was on the impression you already had irqfd 'connected' to KVM from 
> > > > VFIO... This is why I was asking about the nature of the changed in VFIO.
> > > > 
> > > > > When VFIO sends an interrupt, it disables the physical device from
> > > > > generating more interrupts (this is where VFIO requires PCI 2.3
> > > > > compliant devices for the INTx disable bit int he status register).
> > > > > When the guest services the interrupt, we can detect this by catching
> > > > > the EOI of the IOAPIC.  At that point, we can re-eanble interrupts on
> > > > > the device.  Wash, rinse, repeat.
> > > > >
> > > > > To do this in qemu, I created a callback on the ioapic where drivers can
> > > > > register for the interrupt they care about.  Since KVM moves the ioapic
> > > > > into the kernel, we need to extend this into KVM and have yet another
> > > > > eventfd mechanism.  It's possible that we could have the VFIO kernel
> > > > > module also receive this eventfd, re-enabling interrupts on the device,
> > > > > in much the same way as above.
> > > > 
> > > > In the cases of KVM where are you going to catch the EIO? For some 
> > > > reason I'm on the impression that this is part of KVM. If so then how are 
> > > > you going to 'signal' to VFIO? Cannot use eventfd here right?
> > > 
> > > KVM already has an internal IRQ ACK notifier (which is what current
> > > device assignment uses to do the same thing), it's just a matter of
> > > adding a callback that does a kvm_register_irq_ack_notifier that sends
> > > off the eventfd signal.  I've got this working and will probably send
> > > out the KVM patch this week.  For now the eventfd goes to userspace, but
> > > this is where I imagine we could steal some of the irqfd code to make
> > > VFIO consume the irqfd signal directly.  Thanks,
> > > 
> > > Alex
> > 
> > BTW, how do we handle sharing the interrupt in guest?
> 
> I'm currently using flags to track whether we've asserted the interrupt
> in qemu, and only act on the eoi when the flag is set.  In my current
> setup, the guest puts the pass through device and USB on the same
> interrupt and using this filtering seems to be sufficient.  I think this
> should act just like bare metal, the device will reassert the interrupt
> if it still needs service, but we can avoid obviously gratuitous eois
> being passed down to vfio.
> 
> This will complicate having vfio intercept the eoi eventfd directly
> since it will then need to track the state too.  Another thing I've got
> working is letting vfio support older non-PCI-2.3 compliant devices so
> long as they can claim an exclusive interrupt (just like current code).
> We need to track whether the irq is enabled or disabled for this anyway
> so that we don't get unbalanced enabled/disables.
> 
> Alex

Tracking state is also good for saving an extra config read
on each access.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html