On Thu, 2014-11-13 at 09:39 +0000, Shannon Zhao wrote: > When we use only virtio-mmio or vhost-net without irqfd, the device uses qemu_set_irq(within qemu) > to inject interrupt and at the same time qemu update "VIRTIO_MMIO_INTERRUPT_STATUS" to tell guest > driver whom this interrupt to. All these things are done by qemu. Though qemu_set_irq will call > kvm_set_irq to do the real interrupt inject, but the trigger is qemu and it can update the > "VIRTIO_MMIO_INTERRUPT_STATUS" register before injecting the interrupt. > > But if we use vhost-net with irqfd, the device uses ioeventfd mechanism to inject interrupt. > When an interrupt happened, it doesn't transfer to qemu, while the irqfd finally call kvm_set_irq > to inject the interrupt directly. All these things are done in kvm and it can't update the > "VIRTIO_MMIO_INTERRUPT_STATUS" register. > So if the guest driver still uses the old interrupt handler, it has to read the > "VIRTIO_MMIO_INTERRUPT_STATUS" register to get the interrupt reason and run different handlers > for different reasons. But the register has nothing and none of handlers will be called. > > I make myself clear? :-) I see... (well, at least I believe I see ;-) Of course this means that with irqfd, the device is simply non-compliant with the spec. And that extending the status register doesn't help you at all, so we can drop this idea. Paradoxically, it's a good news (of a sort :-) because I don't think we should be doing such a fundamental change in the spec at this date. I'll discuss it with others in the TC and I'm open to be convinced otherwise, but I believe the spec should stay as it is. Having said that, I see no problem with experimenting with the device model so the next version of the standard is extended. Two suggestions I have would be: 1. Drop the virtio-pci like case with two interrupts (one for config changes, one for all queues), as it doesn't bring anything new. Just make it all interrupts individual. 2. Change the way the interrupts are acknowledge - instead of a bitmask, have a register which takes a queue number to ack the queue's interrupt and ~0 to ack the config interrupt. 3. Change the version of the device to (intially) 0x80000003 - I've just made an executive decision :-) that non-standard compliant devices should have the MSB of the version number set (I'll propose to reserve this range of versions in future version of the spec). We'll consider it a "prototype of the version 3". Then make the driver behave in the new way when (and only when) such device version is observed. Also, I remembered I haven't addressed one of your previous comments: On Wed, 2014-11-12 at 08:32 +0000, Shannon Zhao wrote: > > One point I'd like to make is that the device was intentionally designed > > with simplicity in mind first, performance later (something about > > "embedded" etc" :-). Changing this assumption is of course possible, but > Ah, I think ARM is not only about embedded things. Maybe it could has > a wider application > such as micro server. Just my personal opinion. By all means, I couldn't agree more. But there's one thing you have to keep in mind - it doesn't matter whether the real hardware has got PCI or not, but what is emulated by qemu/KVM. Therefore, if only you can get the PCI host controller working in the qemu virt machine (which should be much simpler now, as we have generic support for PCI controllers/bridges in the kernel now), you'll be able to forget the issue of virtio-mmio and multiple interrupts and still enjoy your performance gains :-) Does it all make sense? Pawel _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization