Re: [RFC PATCH 0/4] ARM: KVM: Enable the ioeventfd capability of KVM on ARM

Eric Auger <eric.auger@xxxxxxxxxx> · Mon, 05 May 2014 16:52:30 +0200

Dear All,

I will try to sketch a global view of how to assign a physical IRQ to a
KVM guest using the following subsystems:

- on kernel side: VFIO driver, KVM IRQFD and GSI routing
- on user side: QEMU system image featuring VFIO QEMU device.

It aims at sharing knowledge and checking that the understanding of the
legacy is correct (MSI routing is out of scope).

GSI routing table:

Each VM has its own routing table. This later aims at storing how a
physical IRQ (the gsi) is connected to a guest.

GSI routing entries contain the following fields (not exhaustive):
- gsi (the physical IRQ)
- irqchip (the virtual interrupt controller)
- irqchip.pin (the interrupt controller input the gsi is routed to).
- set() is the method that enables to trigger the virtual interrupt for
this entry (basically depends on the irqchip, IOAPI, PIC, GIC, ...).
The complete definition can be found in include/linux/kvm_host.h.

IRQFD:

irqfd framework makes it possible to assign physical IRQs (the gsi
above) to KVM guests. An eventfd is associated to a physical IRQ (the
gsi). When the eventfd is signaled (typically by a VFIO driver ISR), the
irqfd framework has the role to inject the virtual IRQ associated to
this physical IRQ. This is done on kernel side. The injection is made
through the virtual interrupt controller and made visible on next VM
runtime window.

the _irqfd struct (eventfd.c) itself stores the VM it applies to and the
gsi it is linked with.

The irqfd framework depends on the KVM routing table to find the
remaining information needed to perform the guest injection:
- the virtual interrupt controller used for injection (irqchip)
- the input pin of the interrupt controller (irqchip.pin)
- the set() function irdfq must call to trigger the virtual IRQ

Although the _irqfd struct has a routing entry field (irq_entry), this
one is not used for GSI (it is for MSI). Thus when the eventfd is
signaled, the routing table is searched for the associated GSI and
eventually set() is called.

Note that irqfd allows you to define a second (optional) eventfd (called
the resampler), which can be signaled by KVM when the virtual IRQ is
completed (EOI).

QEMU side:

The irq routing table can be set using KVM user API using
KVM_SET_GSI_ROUTING. Typically this would be QEMU's job at init, after
which QEMU would not be involved anymore in IRQ injection.

By the way it is also interesting to note that KVM_ASSIGN_DEV_IRQ makes
it possible to define a correspondance between a host physical IRQ and a
guest virtual IRQ. But it relies on PCI device assignment. This API does
not support MSI which is a drawback compared to IRQFD one. It sounds to
me that implementing irqfd and GSI routing looks more straightforward
and maybe more future proof.

ARM PORTING:

Let's consider how to apply this in the ARM world.  On ARM we only have
a single kind of irqchip, namely the VGIC. the virtual interrupt
controller is implemented in virt/kvm/arm/vgic.c.

irq_comm.c still contains quite a lot of architecture specific code and
needs to be retargeted for ARM as it is today for x86 or IA64. For
example kvm_set_pic_irq need to call kvm_vgic_inject_vfio_irq.
CONFIG_ARM_GIC could be used to #ifdef the code. An alternative would be
to refactor the code into architecture specific files.

My current understanding is that kvm irqchip and irq routing only is
used by irqfd and assigned_dev. I am not sure whether there is any
interest in supporting injection of other IRQs than SPI (SGI, PPI are
not supposed to be routed by either irqfd, assigned dev, is that a
correct understanding?).

Please feel free to correct and comment.

Thank you in advance

Best Regards

Eric

On 05/02/2014 08:10 PM, Antonios Motakis wrote:
> A potential and straightforward way to augment the KVM_SET_GSI_ROUTING
> API for ARM/VGIC:
> 
> New capability: KVM_CAP_IRQ_ROUTING_PPI
> 
> In addition to the existing routing entry types:
> /* gsi routing entry types */
> #define KVM_IRQ_ROUTING_IRQCHIP 1
> #define KVM_IRQ_ROUTING_MSI 2
> 
> Add new routing entry type:
> #define KVM_IRQ_ROUTING_IRQCHIP_PPI 3
> 
> In the kvm_irq_routing entry struct add an entry for the new union case:
> struct kvm_irq_routing_entry {
>     __u32 gsi;
>     __u32 type;
>     __u32 flags;
>     __u32 pad;
>     union {
>         struct kvm_irq_routing_irqchip irqchip;
>         struct kvm_irq_routing_irqchip_ppi irqchip_ppi;
>         struct kvm_irq_routing_msi msi;
>         __u32 pad[8];
>     } u;
> };
> 
> The new structure may look like this:
> struct kvm_irq_routing_irqchip {
>     __u32 irqchip;
>     __u32 vcpu;
>     __u32 pin;
> };
> 
> 
> SPIs would be set the usual way via the KVM_IRQ_ROUTING_IRQCHIP type.
> PPIs would be set using the new KVM_IRQ_ROUTING_IRQCHIP_PPI type, and
> the target VCPU may be specified via __u32 vcpu.
> 
> The __u32 vcpu variable would be interpreted as is currently done with
> KVM_IRQ_LINE on ARM: via kvm_get_vcpu(kvm, vcpu_idx).
> 
> This way more than 256 VCPUs may be added to a system without breaking
> KVM_IRQ_LINE; it suffices to create new routing tables for the VGIC (the
> default ones will still map to only 256 VCPUs).
> 
> 
> 
> On Fri, Apr 18, 2014 at 7:11 PM, Antonios Motakis
> <a.motakis@xxxxxxxxxxxxxxxxxxxxxx
> <mailto:a.motakis@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
> 
> 
> 
> 
>     On Mon, Apr 14, 2014 at 3:45 PM, Marc Zyngier <marc.zyngier@xxxxxxx
>     <mailto:marc.zyngier@xxxxxxx>> wrote:
> 
>         On 11/04/14 12:09, Antonios Motakis wrote:
>         > On Thu, Apr 10, 2014 at 12:51 PM, Peter Maydell
>         > <peter.maydell@xxxxxxxxxx <mailto:peter.maydell@xxxxxxxxxx>>
>         wrote:
>         >>
>         >> On 10 April 2014 09:58, Antonios Motakis
>         >> <a.motakis@xxxxxxxxxxxxxxxxxxxxxx
>         <mailto:a.motakis@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
>         >>> Though in this case, what makes IRQ routing support useful
>         is not any
>         >>> particular feature it enables, but how it is used as a standard
>         >>> interface towards in-kernel IRQ chips for KVM. The eventfd
>         support in
>         >>> KVM makes heavy use of that, so IRQ routing gives us IRQFDs
>         without
>         >>> having to completely butcher all the eventfd and irqfd code.
>         >>
>         >> I think you should propose a concrete API and give examples
>         >> of how userspace would be using it; these abstract discussions
>         >> aren't really coming together in my head. Can the kernel
>         >> just set up the initial routing mapping as 1:1 so userspace
>         >> can ignore the pointless extra level of indirection?
>         >>
>         >
>         > Yes, this is what the user gets by default. Unless
>         KVM_SET_GSI_ROUTING
>         > is used, userspace should not be able to tell the difference.
>         >
>         > KVM_IRQ_LINE as used to inject an IRQ, and based on the
>         provided irq
>         > field the right VGIC pin will be stimulated. The mapping of
>         the irq
>         > field to a VGIC pin would be as it is already documented today:
>         >
>         >>   bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
>         >>   field: | irq_type  | vcpu_index |     irq_id     |
>         >>
>         >> The irq_type field has the following values:
>         >> - irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1
>         is FIQ
>         >> - irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019
>         (incl.)
>         >>                (the vcpu_index field is ignored)
>         >> - irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31
>         (incl.)
>         >
>         > This should be still valid, by default. The only thing that
>         routing
>         > adds, is the capability to use KVM_SET_GSI_ROUTING to change this
>         > mapping to something else (and towards the pins of multiple
>         IRQ chips,
>         > if that need comes up).
>         >
>         > Though the part that is of interest to IRQFDs is not the new
>         API to
>         > change the routing. The neat point is that we get an
>         abstraction in
>         > the kernel that allows us to interact with the IRQ chip
>         without having
>         > to deal with the semantics of how that IRQ should be
>         interpreted on
>         > that platform, and the IRQFD code makes use of that.
>         >
>         > With KVM_SET_GSI_ROUTING one can provide an array of struct
>         > kvm_irq_routing_entry entries:
>         >
>         > struct kvm_irq_routing_entry {
>         >     __u32 gsi;
>         >     __u32 type;
>         >     __u32 flags;
>         >     __u32 pad;
>         >     union {
>         >         struct kvm_irq_routing_irqchip irqchip;
>         >         struct kvm_irq_routing_msi msi;
>         >         __u32 pad[8];
>         >     } u;
>         > };
>         >
>         > struct kvm_irq_routing_irqchip {
>         >     __u32 irqchip;
>         >     __u32 pin;
>         > };
>         >
>         > struct kvm_irq_routing_msi {
>         >     __u32 address_lo;
>         >     __u32 address_hi;
>         >     __u32 data;
>         >     __u32 pad;
>         > };
>         >
>         > __u32 gsi is the global interrupt that we want to match to an
>         IRQ pin.
>         > We map this to an __u32 irqchip and __u32 pin.
>         >
>         > For VGIC we just need to define what pins we will expose. For
>         VGICv2
>         > that would be 8 CPUs times 16 PPIs plus the SPIs.
> 
>         Note that this will somehow change for GICv3, which supports up
>         to 2^32
>         CPUs, and up to 2^32 interrupt IDs. We could decide to limit
>         ourselves
>         to, let's say, 256 CPUs, and 16bits of ID space, but that would be a
>         rather massive limitation.
> 
> 
>     Hm, that limitation is pretty interesting actually...
>     KVM_SET_GSI_ROUTING is a vm ioctl, so to do this properly we need to
>     set some GSIs at the vcpu level... Seems we either limit ourselves,
>     or we find a neat way to change the API.
> 
>     KVM_IRQ_LINE however is already limited to 256 CPUs. So a way to
>     encode more than 256 target CPUs with KVM_SET_GSI_ROUTING would
>     actually enable us to use more than 256 VCPUs without breaking
>     KVM_IRQ_LINE in the future, since the existing limit in that case
>     would be just a default that we can change.
> 
> 
>         > Another difference from other platforms is that we would accept to
>         > reroute only based on the 24 least significant bits; the 8 most
>         > significant bits we already have defined that we need to
>         distinguish
>         > between in kernel and out of kernel IRQs. We would only support
>         > routing for the in-kernel GIC. Asking to reroute an out of
>         kernel GIC,
>         > should return an error to userspace.
> 
>         Why do we have to be tied to the current representation that
>         userspace
>         uses? It seems to me like an unnecessary limitation.
> 
> 
>     I guess it is a matter of taste. By allowing that, we would lock out
>     userspace from using an out of kernel GIC as soon as it decided to
>     change the routing. Of course, if userspace decides to do that it
>     almost certainly plans to use the in kernel implementation anyway.
>      
> 
>                 M.
>         --
>         Jazz is not dead. It just smells funny...
> 
> 
> 
> 
>     -- 
>     Antonios Motakis
>     Virtual Open Systems
> 
> 
> 
> 
> -- 
> Antonios Motakis
> Virtual Open Systems
> 
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@xxxxxxxxxxxxxxxxxxxxx
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm