Hello Christoffer, Since there is much interest on the work Virtual Open Systems has initiated on VFIO, let me share our current work in progress. ** IOEVENTFD and IRQFD support in VFIO IRQFD support includes RESAMPLEFD support to properly handle level sensitive interrupts from userspace. Allow to set an eventfd to mask/unmask interrupts Related to: IOEVENTFD and IRQFD support in KVM on ARM ** IOEVENTFD and IRQFD support in KVM on ARM IRQFD and RESAMPLEFD are the interface in KVM where we can directly 'plug' interrupts so they can be injected without exiting to QEMU. IRQFD support in KVM depends on a common abstraction between IRQ chips that relies on the capabilty for IRQ routing. The major challenge in haronizing the VGIC with other KVM IRQ chips is the support for IRQ routing. Target: KVM on ARM RFC patch around the time of VFIO_PLATFORM v6 + Patches to use the above with QEMU We also plan to publish in the next days a userspace test case based on the PL330. The above are targeted for the next version, VFIO_PLATFORM RFC patches v6. Additionally, - Direct ARM AMBA device support Currently to use AMBA devices the devicetree has to be modified to present them as platform devices. Allow VFIO_PLATFORM to bind directly to AMBA devices. - Expose device tree metadata through the VFIO API There is a proposal to expose information pullsed from the device tree via the VFIO API, in order to assist QEMU in generating a valid device tree for the guest. Use cases where this might be needed are not clear. - Device specific functionalities (e.g. VFIO_DEVICE_RESET) - IOMMUs with nested page tables support ===================================================================== A recap of VFIO IRQ injection, the universe, and everything. Currently we can inject interrupts with VFIO with an IOCTL. This mostly works, but is not ideal for level sensitive interrupts. This has to do with the way VFIO treats level sensitive interrupts. Instead of firing the interrupt continuously, it is automatically masked as soon as it fires. It is up to userspace to unmask it when it wishes to receive new interrupts from the device. Leaving the interrupt always unmasked is not desirable, since then we would have the guest stalling due to an infinite number of interrupts being injected. With a userspace VFIO driver, we can know the device semantics and unmask accordingly, but with QEMU we want to be more generic (and we want to take advantage of MMAPing the device regions directly to the guest intermediate physical address space). There are two workarounds around, one is unmasking periodically with a timer, the other unmasking every time the guest accesses a device region. Each one with its own disadvantages. Needless to say this is hack. The proper way to do this is with RESAMPLEFD support, which is an extra capability of IRQFD. With this feature we can set an eventfd to fire whenever the guest does an EOI on the VGIC. QEMU can pass this eventfd to VFIO, and unmask the interrupt in a safe and device independent fashion. As far as I know KVM does not include a way to notify userspace for an EOI or equivalent by the guest. The current hacks work, but I don't consider them permanent solutions. Also on top of those contraints, IRQFD is still very very desirable since it allows us to inject interrupts from VFIO to KVM by skipping userspace completely. QEMU will just pass the right eventfds around on setup. Both IRQFD and RESAMPLEFD implementations are pretty generic in the KVM codebase, but we need IRQ routing to use them. To be more precise, we need to expose the VGIC via the irqchip.c interface for KVM IRQ chips, which however is very coupled with IRQ routing. In fact, IRQ routing is implemented as a feature of that interface - but we still need to provide some glue with it and the default routing for our platform. The complication stems from the fact that we have to worry about PPIs and their target CPU in addition to plain SPIs - which other platforms don't. ===================================================================== On to IRQ routing: The default IRQ routing for one in kernel vgic IRQCHIP needs also to match the semantics used for KVM_IRQ_LINE on KVM. Quote from the KVM API: ===================================================================== ARM/arm64 can signal an interrupt either at the CPU level, or at the in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for specific cpus. The irq field is interpreted like this: bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 | field: | irq_type | vcpu_index | irq_id | The irq_type field has the following values: - irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ - irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.) (the vcpu_index field is ignored) - irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.) (The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs) ===================================================================== Doing this, we would have to make the implementation in vgic.c consistent with other IRQCHIPS, by using the irqchip.c common code and putting it behind the KVM IO BUS API. However this would be just the default way IRQs would be routed to guests. With IRQ routing the user can change this at will. Since we don't want to keep the limitation of 256 vCPUs, we are currently proposing this: New capability: KVM_CAP_IRQ_ROUTING_PPI In addition to the existing routing entry types: /* gsi routing entry types */ #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 Add new routing entry type: #define KVM_IRQ_ROUTING_IRQCHIP_PPI 3 In the kvm_irq_routing entry struct add an entry for the new union case: struct kvm_irq_routing_entry { __u32 gsi; __u32 type; __u32 flags; __u32 pad; union { struct kvm_irq_routing_irqchip irqchip; struct kvm_irq_routing_irqchip_ppi irqchip_ppi; struct kvm_irq_routing_msi msi; __u32 pad[8]; } u; }; The new structure may look like this: struct kvm_irq_routing_irqchip { __u32 irqchip; __u32 vcpu; __u32 pin; }; SPIs would be set the usual way via the KVM_IRQ_ROUTING_IRQCHIP type. PPIs would be set using the new KVM_IRQ_ROUTING_IRQCHIP_PPI type, and the target VCPU may be specified via __u32 vcpu. The __u32 vcpu variable would be interpreted as is currently done with KVM_IRQ_LINE on ARM: via kvm_get_vcpu(kvm, vcpu_idx). This way more than 256 VCPUs may be added to a system without breaking KVM_IRQ_LINE; it suffices to create new routing tables for the VGIC (the default ones will still map to only 256 VCPUs). ===================================================================== The VFIO Side For VFIO_PLATFORM it is not a different story than what VFIO already does on x86. We are going to support the same API already documented in include/uapi/linux/vfio.h and will be part of the next version of the VFIO_PLATFORM patches. In this case, instead of injecting interrupts via an IOCTL, the user will use a eventfds to inject interrupts, and also mask/unmask them. Best regards -- Antonios Motakis Virtual Open Systems _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm