Re: [PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 15, 2013 at 02:05:41PM -0600, Scott Wood wrote:
> On 02/14/2013 06:01:08 PM, Paul Mackerras wrote:
> >From: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
> >
> >This adds in-kernel emulation of the XICS (eXternal Interrupt
> >Controller Specification) interrupt controller specified by PAPR, for
> >both HV and PR KVM guests.
> >
> >This adds a new KVM_CREATE_IRQCHIP_ARGS ioctl, which is like
> >KVM_CREATE_IRQCHIP in that it indicates that the virtual machine
> >should use in-kernel interrupt controller emulation, but also takes an
> >argument struct that contains the type of interrupt controller
> >architecture and an optional parameter.  Currently only one type value
> >is defined, that which indicates the XICS architecture.
> 
> Would the device config API I posted a couple days ago work for you?

I suppose it could be made to work.  It doesn't feel like a natural
fit though, because your API seems to assume (AFAICT) that a device is
manipulated via registers at specific physical addresses, so I would
have to invent an artificial set of registers with addresses and bit
layouts, that aren't otherwise required.  The XICS is operated from
the guest side via hcalls, not via emulated MMIO.

Part of the reason I went with the API that I did is that that was
what was agreed on at KVM Forum (as far as I can tell, not having been
at the meeting).  Your device API seems to be quite different to that.

> >The XICS emulation supports up to 1048560 interrupt sources.
> >Interrupt source numbers below 16 are reserved; 0 is used to mean no
> >interrupt and 2 is used for IPIs.  Internally these are represented in
> >blocks of 1024, called ICS (interrupt controller source) entities, but
> >that is not visible to userspace.
> >
> >Two other new ioctls allow userspace to control the interrupt
> >sources.  The KVM_IRQCHIP_SET_SOURCES ioctl sets the priority,
> >destination cpu, level/edge sensitivity and pending state of a range
> >of interrupt sources, creating them if they don't already exist.  The
> >KVM_IRQCHIP_GET_SOURCES ioctl returns that information for a range of
> >interrupt sources (they are required to already exist).
> 
> Why is it userspace's job to control these?  If you use KVM_IRQ_PENDING

These are primarily there to support live migration.  For live
migration, userspace needs to be able to extract the entire state of
the virtual machine from the old guest, and then set the new guest to
that exact same state.  We have live migration working in qemu for
pSeries guests with in-kernel XICS emulation using this interface.
If you're not doing live migration, the only time userspace needs to
use these is at initialization, when it does a SET_SOURCES to create
the interrupt sources it wants the VM to have.

> for interrupt injection, what if there's a race with the user changing
> other flags via MMIO?  Maybe this isn't an issue with XICS, but this is
> being presented as a generic API.

They're not used while the guest is running, as I said, but even if
they were, there is appropriate locking in there to handle any races.

> >+4.80 KVM_CREATE_IRQCHIP_ARGS
> >+
> >+Capability: KVM_CAP_IRQCHIP_ARGS
> >+Architectures: ppc
> 
> Why just ppc?

Just because only PPC has code to handle it at this point.  I would
hope that ARM and others could pick it up.  Maybe I should make it:

+Architectures: all (but so far only implemented on ppc)

or something.

> >+struct kvm_irq_sources {
> >+	__u32 irq;
> >+	__u32 nr_irqs;
> >+	__u64 __user *irqbuf;
> >+};
> 
> Please no pointers in UAPI -- this would require a compat wrapper with
> 32-bit user and 64-bit kernel.

Hmmm, you're right.  I suppose it will have to be a fixed-size buffer,
which is generally less efficient, but since it's mainly for migration
that probably doesn't matter.  In fact, I could probably use the
existing KVM_GET_IRQCHIP and KVM_SET_IRQCHIP ioctls, if I use the
`chip_id' field for `irq' and the `pad' field for `nr_irqs'.

> >+/* irqbuf entries are laid out like this: */
> >+#define KVM_IRQ_SERVER_SHIFT	0
> >+#define KVM_IRQ_SERVER_MASK	0xffffffffULL
> >+#define KVM_IRQ_PRIORITY_SHIFT	32
> >+#define KVM_IRQ_PRIORITY_MASK	0xff
> >+#define KVM_IRQ_LEVEL_SENSITIVE	(1ULL << 40)
> >+#define KVM_IRQ_MASKED		(1ULL << 41)
> >+#define KVM_IRQ_PENDING		(1ULL << 42)
> 
> What does "server" mean?  Do you mean "laid out like this for XICS"?

Sorry, I should have made that "destination" rather than "server".
You're right, "server" is confusing, but it just means "where the
interrupt is sent to be handled".  It has nothing particularly to do
with "server" computers.

> Let's please have a clean separation between what is generic and what is
> implementation-specific.

I believe that the interface is pretty cleanly generic - the model is
a set of interrupt sources and some per-vcpu state, with priorities to
decide which interrupts get delivered when.  That describes the basics
of just about any SMP-capable interrupt controller, including MPIC.

MPIC would still need an extra interface for userspace to save and
restore things like the timer registers at live migration time, and
for userspace to configure the base address, MPIC version, etc., of
course.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux