On Mon, Aug 13, 2012 at 02:48:25PM -0600, Alex Williamson wrote: > On Mon, 2012-08-13 at 22:50 +0300, Michael S. Tsirkin wrote: > > On Mon, Aug 13, 2012 at 12:17:25PM -0600, Alex Williamson wrote: > > > On Mon, 2012-08-13 at 19:59 +0300, Michael S. Tsirkin wrote: > > > > On Mon, Aug 13, 2012 at 10:48:15AM -0600, Alex Williamson wrote: > > > > > On Sun, 2012-08-12 at 10:49 +0300, Michael S. Tsirkin wrote: > > > > > > On Wed, Aug 01, 2012 at 01:06:42PM -0600, Alex Williamson wrote: > > > > > > > On Mon, 2012-07-30 at 19:12 -0600, Alex Williamson wrote: > > > > > > > > On Tue, 2012-07-31 at 03:36 +0300, Michael S. Tsirkin wrote: > > > > > > > > > On Mon, Jul 30, 2012 at 06:26:31PM -0600, Alex Williamson wrote: > > > > > > > > > > On Tue, 2012-07-31 at 03:01 +0300, Michael S. Tsirkin wrote: > > > > > > > > > > > You keep saying this but it is still true: once irqfd > > > > > > > > > > > is closed eoifd does not get any more interrupts. > > > > > > > > > > > > > > > > > > > > How does that matter? > > > > > > > > > > > > > > > > > > Well if it does not get events it is disabled. > > > > > > > > > so you have one ifc disabling another, anyway. > > > > > > > > > > > > > > > > And a level irqfd without an eoifd can never be de-asserted. Either we > > > > > > > > make modular components, assemble them to do useful work, and > > > > > > > > disassemble them independently so they can be used by future interfaces > > > > > > > > or we bundle eoifd as just an option of irqfd. Which is it gonna be? > > > > > > > > > > > > > > I don't think I've been successful at explaining my reasoning for making > > > > > > > EOI notification a separate interface, so let me try again... > > > > > > > > > > > > > > When kvm is not enabled, the qemu vfio driver still needs to know about > > > > > > > EOIs to re-enable the physical interrupt. Since the ioapic is emulated > > > > > > > in qemu, we can setup a notifier for this and create abstraction to make > > > > > > > it non-x86 specific, etc. We just need to come up with a design and > > > > > > > implement it. But what happens when kvm is then enabled? ioapic > > > > > > > emulation moves to the kernel (assume kvm includes irqchip for this > > > > > > > argument even though it doesn't for POWER), qemu no longer knows about > > > > > > > EOIs, and the interface we just created to handle the non-kvm case stops > > > > > > > working. Is anyone going to accept adding a qemu EOI notification > > > > > > > interface that only works when kvm is not enabled? > > > > > > > > > > > > Yes, it's only a question of abstracting it at the right level. > > > > > > > > > > > > For example, if as you suggest below kvm gives you an eventfd that > > > > > > asserts an irq, laters automatically deasserts it and notifies another > > > > > > eventfd, we can do exactly this in both tcg and kvm: > > > > > > > > > > > > setup_level_irq(int gsi, int assert_eventfd, int deassert_eventfd) > > > > > > > > > > > > Not advocating this interface but pointing out that to make > > > > > > same abstraction to work in tcg and kvm, see what it does in > > > > > > each of them first. > > > > > > > > > > The tcg model I was thinking of is that we continue to use qemu_set_irq > > > > > to assert and de-assert the interrupt and add an eoi/ack notification > > > > > mechanism, much like the ack notifier that already exists in kvm. There > > > > > doesn't seem to be much advantage to creating a new interrupt > > > > > infrastructure in tcg that can trigger interrupts by eventfds, so I > > > > > assume VFIO is always going to be responsible for the translation of an > > > > > eventfd to an irq assertion, get some kind of notification through qemu, > > > > > de-assert the interrupt and unmask the device. With that model in mind, > > > > > perhaps it makes more sense why I've been keeping the eoi/ack separate > > > > > from irqfd. > > > > > > > > > > > > I suspect we therefore need a notification mechanism between kvm and > > > > > > > qemu to make it possible for that interface to continue working. > > > > > > > > > > > > Even though no one is actually using it. IMHO, this is a maintainance > > > > > > problem. > > > > > > > > > > That's why I'm designing it the way I am. VFIO will make use of it. It > > > > > will just be using the de-assert and notify mode vs a notify-only mode > > > > > that tcg would use. It would also be easy to add an option to vfio so > > > > > that we could fully test both modes. > > > > > > > > > > > > An > > > > > > > eventfd also seems like the right mechanism there. A simple > > > > > > > modification to the proposed KVM_EOIFD here would allow it to trigger an > > > > > > > eventfd when an EOI is written to a specific gsi on > > > > > > > KVM_USERSPACE_IRQ_SOURCE_ID (define a flag and pass gsi in place of > > > > > > > key). > > > > > > > > > > > > > > The split proposed here does require some assembly, but KVM_EOIFD is > > > > > > > re-usable as either a de-assert and notify mechanism tied to an irqfd or > > > > > > > a notify-only mechanism allowing users of a qemu EOI notification > > > > > > > infrastructure to continue working. vfio doesn't necessarily need this > > > > > > > middle ground, but can easily be used to test it. > > > > > > > > > > > > > > The alternative is that we pull eoifd into KVM_IRQFD and invent some > > > > > > > other new EOI interface for qemu. That means we get EOIs tied to an > > > > > > > irqfd via one path and other EOIs via another ioctl. Personally that > > > > > > > seems less desirable, but I'm willing to explore that route if > > > > > > > necessary. Thanks, > > > > > > > > > > > > > > Alex > > > > > > > > > > > > Maybe we should focus on the fact that we notify userspace that we > > > > > > deasserted interrupt instead of EOI. > > > > > > > > > > But will a tcg user want the de-assert? I assume not. The de-assert is > > > > > an optimization to allow us to bypass evaluation in userspace. In tcg > > > > > we're already there. Thanks, > > > > > > > > > > Alex > > > > > > > > Look what I am saying forget tcg and APIs. Build a kernel interface that > > > > makes sense. Then in qemu look at kvm and tcg and build abstraction for > > > > it. Building kernel interface so you can make nice abstractions in tcg > > > > is backwards. > > > > > > Can you suggest specifically what doesn't make sense? > > > > Interface is just very easy to misuse. Here are things that > > you expose that to me do not seem to make sense: > > > > - ability to create irqfd that by default can not be deasserted > > (you need eoifd to deassert) > > Well, it's not really the default, a user has to add a flag to get this > ability. > > > - interface to create eventfd that by default never gets events > > (you need irqfd to assert) > > In v8, this isn't the default, the user has to specify that they want to > use it to de-assert. > > > - creating ack eventfd requires level irqfd but you won't > > know it unless you read documentation > > This is also fixed in v8, you get a source ID, then hook it up to an > irqfd/irq ackfd any way you want. > > > - duplicating level/edge information that we already have in GSI > > Not really duplication, the edge/level information is several layers of > indirection away from this interface. As we've discussed in the past, > relying on that information also means that the behavior of an ioctl > depends on the state of another piece of emulated hardware which is > controlled by the guest at the time the ioctl is called. Personally, I > don't think that's a good characteristic. > > > Knowing all these quirks is a must if you want things to > > work, but you do not know them until you read documentation. > > This is not good interface, a good interface is > > hard to misuse and self-documenting. > > I think v8 makes improvements here, I'd be happy to hear your feedback > on it. > > > > For legacy interrupts VFIO needs to: > > > > > > - Assert an interrupt > > > > > > Eventfds seem to be the most efficient way to signal when to > > > assert an interrupt and gives us the flexibility that we can > > > send that signal to either another kernel module or to > > > userspace. KVM_IRQFD is designed for exactly this, but needs > > > modifications for level triggered interrupts. These include: > > > > > > - Using a different IRQ source ID > > > > > > GSIs are not exclusive, multiple devices may assert the > > > same GSI. IRQ source IDs are how KVM handles multiple > > > inputs. > > > > Actually, thinking about it some more, all assigned > > device interrupts are deasserted on ack, so together. > > And userspace does the OR in userspace already. > > > > So why is it not enough to give IRQFDs a single separate > > source ID, distinct from userspace but shared by all devices? > > We could do that, but then we lose any ability to filter the KVM irq ack > notifier based on whether a given IRQ source ID is asserted. This is > something you've been pushing for. We ended tracking it in irqfd, no? > Note that patch 1/6 of the v8 series > adds this generically for all irq ack notifier users. That's of course > just an optimization, How is it an optimization? > we could have IRQ source IDs re-used and that > might be a good solution if we ever start exhausting them. v8 allows > userspace to do this if it wants. How does userspace know whether it should do it or not? > > > - Assert-only > > > > > > KVM_IRQFD currently does assert->deassert to emulate an > > > edge triggered interrupt. For level, we need to be able > > > to signal a discrete assertion and de-assertion event. > > > This results in the modifications I've proposed to KVM_IRQFD. > > > > Actually is it really necessary at all? What happens if we assert and > > deassert immediately? If guest lost the interrupt, on EOI device will > > reassert resulting in another interrupt. > > It's been a while since I've tried, but I recall I used this as a > workaround early on in development and it did work. I don't feel it's a > proper representation of the hardware we're trying to emulate though and > istr that Avi wasn't too fond of it either. EOI hack is not a proper representation either. I think we were just confused and thought there's a race. > > > - Know when to de-assert an interrupt > > > > > > Servicing an interrupt is device specific, we can't know for any > > > random device what interactions with the device indicate service > > > of an interrupt. We therefore look to the underlying hardware > > > support where a vCPU writes an End Of Interrupt to the APIC to > > > indicate the chip should re-sample it's inputs and either > > > de-assert or continue asserting the interrupt level. Our device > > > may still require service at this point, but this mechanism has > > > proven effective with KVM assignment. > > > > > > This results in the notify-only portion of the EOIFD/IRQ_ACKFD. > > > > > > - Deassert an interrupt > > > > > > Now that we have an interrupt that's been asserted and we > > > suspect that we should re-evaluate the interrupt signal due to > > > activity possibly related to an EOI, we need a mechanism to > > > de-assert the interrupt. There are two possibilities here: > > > > > > - Test and de-assert > > > > > > Depending on hardware support for INTxDisable, we may be > > > able to poll whether the hardware is still asserting > > > it's interrupt and de-assert if quiesced. This > > > optimizes for the case where the interrupt is still > > > asserting as we avoid re-assertion and avoid unmasking > > > the device. > > > > > > - De-assert, test, (re-assert) > > > > > > Not all hardware supports INTxDisable, so we may have no > > > way to test whether the device is still asserting an > > > interrupt other than to unmask and see if it re-fires. > > > This not only supports the most hardware, but also > > > optimizes for the case where the device is quiesced. > > > > > > Taking the latter path results in the de-assert and notify > > > interface to the above EOIFD/IRQ_ACKFD. This reduces the number > > > of signals between components and supports the most hardware. > > > > > > That leaves dealing with the IRQ source ID. Initially I tried to hide > > > this from userspace as it's more of an implementation detail of KVM, but > > > in v8 I expose it as it offers more flexibility and (I hope) removes > > > some of the odd dependencies between interfaces imposed by previous > > > version. > > > > > > If you have specific suggestions how else to approach this, I welcome > > > the feedback. > > > It would be backwards to design an interface exclusively around a > > > single user, but it would be just as backwards to not envision how an > > > interface would be used in advance. Thanks, > > > > > > Alex > > > > Could you address two questions I ask above pls? > > If we really can use the same source ID for all irqfds, > > and if it's ok to deassert immediately after all, > > then large parts of code can go away. > > > > Or maybe I was away for too long and forgot > > what the problem were ... > > So if we de-assert immediately and remove the notify on de-assert, then > irq_ackfd becomes a notify-only interface. In v8 that's what it is at > it's base, with an option to de-assert. That option (patch 6/6) is a > tiny bit of code. But it is an interface that at least makes some sense. And it is also an existing one. > Removing the irq source ID isn't a clear win to me either. It removes the limitation on number of ackfd/irqfd that there is. > I'm becoming > a broken record, but v8 already simplifies the irq source ID allocation > and preserves the ability to filter irq ack notifications and targeted > re-use of irq source IDs if userspace decides to support that. Thanks, > > Alex I will look at v8. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html