Re: Today's KVM/ARM Sync-up Call

Antonios Motakis <a.motakis@xxxxxxxxxxxxxxxxxxxxxx> · Tue, 6 May 2014 16:11:46 +0200

Hello Christoffer,

Since there is much interest on the work Virtual Open Systems has
initiated on VFIO, let me share our current work in progress.

** IOEVENTFD and IRQFD support in VFIO
IRQFD support includes RESAMPLEFD support to properly handle
level sensitive interrupts from userspace.
Allow to set an eventfd to mask/unmask interrupts
Related to: IOEVENTFD and IRQFD support in KVM on ARM

** IOEVENTFD and IRQFD support in KVM on ARM
IRQFD and RESAMPLEFD are the interface in KVM
where we can directly 'plug' interrupts so they can be injected
without exiting to QEMU.
IRQFD support in KVM depends on a common abstraction between
IRQ chips that relies on the capabilty for IRQ routing. The major
challenge in haronizing the VGIC with other KVM IRQ chips is the
support for IRQ routing.
Target: KVM on ARM RFC patch around the time of VFIO_PLATFORM v6

+ Patches to use the above with QEMU

We also plan to publish in the next days a userspace test case based
on the PL330.

The above are targeted for the next version, VFIO_PLATFORM RFC patches
v6. Additionally,

- Direct ARM AMBA device support
Currently to use AMBA devices the devicetree has to be modified to
present them as platform devices. Allow VFIO_PLATFORM to bind
directly to AMBA devices.
- Expose device tree metadata through the VFIO API
There is a proposal to expose information pullsed from the device tree
via the VFIO API, in order to assist QEMU in generating a valid device
tree for the guest. Use cases where this might be needed are not clear.
- Device specific functionalities (e.g. VFIO_DEVICE_RESET)
- IOMMUs with nested page tables support

=====================================================================

A recap of VFIO IRQ injection, the universe, and everything.

Currently we can inject interrupts with VFIO with an IOCTL. This
mostly works, but is not ideal for level sensitive interrupts. This
has to do with the way VFIO treats level sensitive interrupts. Instead
of firing the interrupt continuously, it is automatically masked as
soon as it fires. It is up to userspace to unmask it when it wishes to
receive new interrupts from the device.

Leaving the interrupt always unmasked is not desirable, since then we
would have the guest stalling due to an infinite number of interrupts
being injected.

With a userspace VFIO driver, we can know the device semantics and
unmask accordingly, but with QEMU we want to be more generic (and we
want to take advantage of MMAPing the device regions directly to the
guest intermediate physical address space). There are two workarounds
around, one is unmasking periodically with a timer, the other
unmasking every time the guest accesses a device region. Each one with
its own disadvantages. Needless to say this is hack.

The proper way to do this is with RESAMPLEFD support, which is an
extra capability of IRQFD. With this feature we can set an eventfd to
fire whenever the guest does an EOI on the VGIC. QEMU can pass this
eventfd to VFIO, and unmask the interrupt in a safe and device
independent fashion.

As far as I know KVM does not include a way to notify userspace for an
EOI or equivalent by the guest. The current hacks work, but I don't
consider them permanent solutions.

Also on top of those contraints, IRQFD is still very very desirable
since it allows us to inject interrupts from VFIO to KVM by skipping
userspace completely. QEMU will just pass the right eventfds around on
setup. Both IRQFD and RESAMPLEFD implementations are pretty generic in
the KVM codebase, but we need IRQ routing to use them.

To be more precise, we need to expose the VGIC via the irqchip.c
interface for KVM IRQ chips, which however is very coupled with IRQ
routing. In fact, IRQ routing is implemented as a feature of that
interface - but we still need to provide some glue with it and the
default routing for our platform. The complication stems from the fact
that we have to worry about PPIs and their target CPU in addition to
plain SPIs - which other platforms don't.

=====================================================================

On to IRQ routing:

The default IRQ routing for one in kernel vgic IRQCHIP needs also to match the
semantics used for KVM_IRQ_LINE on KVM. Quote from the KVM API:
=====================================================================
ARM/arm64 can signal an interrupt either at the CPU level, or at the
in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
use PPIs designated for specific cpus.  The irq field is interpreted
like this:

  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
  field: | irq_type  | vcpu_index |     irq_id     |

The irq_type field has the following values:
- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
               (the vcpu_index field is ignored)
- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)

(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
=====================================================================

Doing this, we would have to make the implementation in vgic.c consistent with
other IRQCHIPS, by using the irqchip.c common code and putting it behind the
KVM IO BUS API.

However this would be just the default way IRQs would be routed to
guests. With IRQ routing the user can change this at will. Since we
don't want to keep the limitation of 256 vCPUs, we are currently
proposing this:

New capability: KVM_CAP_IRQ_ROUTING_PPI

In addition to the existing routing entry types:
/* gsi routing entry types */
#define KVM_IRQ_ROUTING_IRQCHIP 1
#define KVM_IRQ_ROUTING_MSI 2

Add new routing entry type:
#define KVM_IRQ_ROUTING_IRQCHIP_PPI 3

In the kvm_irq_routing entry struct add an entry for the new union case:
struct kvm_irq_routing_entry {
    __u32 gsi;
    __u32 type;
    __u32 flags;
    __u32 pad;
    union {
        struct kvm_irq_routing_irqchip irqchip;
        struct kvm_irq_routing_irqchip_ppi irqchip_ppi;

        struct kvm_irq_routing_msi msi;
        __u32 pad[8];
    } u;
};

The new structure may look like this:
struct kvm_irq_routing_irqchip {
    __u32 irqchip;
    __u32 vcpu;
    __u32 pin;
};

SPIs would be set the usual way via the KVM_IRQ_ROUTING_IRQCHIP type.
PPIs would be set using the new KVM_IRQ_ROUTING_IRQCHIP_PPI type, and
the target VCPU may be specified via __u32 vcpu.

The __u32 vcpu variable would be interpreted as is currently done with
KVM_IRQ_LINE on ARM: via kvm_get_vcpu(kvm, vcpu_idx).

This way more than 256 VCPUs may be added to a system without breaking
KVM_IRQ_LINE; it suffices to create new routing tables for the VGIC
(the default ones will still map to only 256 VCPUs).

=====================================================================
The VFIO Side

For VFIO_PLATFORM it is not a different story than what VFIO already does on
x86. We are going to support the same API already documented in
include/uapi/linux/vfio.h and will be part of the next version of the
VFIO_PLATFORM patches.

In this case, instead of injecting interrupts via an IOCTL, the user
will use a eventfds to inject interrupts, and also mask/unmask them.

Best regards

-- 
Antonios Motakis
Virtual Open Systems
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm