Re: [PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 15, 2013 at 05:59:11PM -0600, Scott Wood wrote:
> On 02/15/2013 05:18:31 PM, Paul Mackerras wrote:
> >On Fri, Feb 15, 2013 at 02:05:41PM -0600, Scott Wood wrote:
> >> On 02/14/2013 06:01:08 PM, Paul Mackerras wrote:
> >> >From: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
> >> >
> >> >This adds in-kernel emulation of the XICS (eXternal Interrupt
> >> >Controller Specification) interrupt controller specified by
> >PAPR, for
> >> >both HV and PR KVM guests.
> >> >
> >> >This adds a new KVM_CREATE_IRQCHIP_ARGS ioctl, which is like
> >> >KVM_CREATE_IRQCHIP in that it indicates that the virtual machine
> >> >should use in-kernel interrupt controller emulation, but also
> >takes an
> >> >argument struct that contains the type of interrupt controller
> >> >architecture and an optional parameter.  Currently only one
> >type value
> >> >is defined, that which indicates the XICS architecture.
> >>
> >> Would the device config API I posted a couple days ago work for you?
> >
> >I suppose it could be made to work.  It doesn't feel like a natural
> >fit though, because your API seems to assume (AFAICT) that a device is
> >manipulated via registers at specific physical addresses, so I would
> >have to invent an artificial set of registers with addresses and bit
> >layouts, that aren't otherwise required.  The XICS is operated from
> >the guest side via hcalls, not via emulated MMIO.
> 
> I don't think it makes such an assumption.  The MPIC device has
> physical registers, so it exposes them, but it also exposes things
> that are not physical registers (e.g. the per-IRQ input state).  The
> generic device control layer leaves interpretation of attributes up
> to the device.
> 
> I think it would be easier to fit XICS into the device control api
> model than to fit MPIC into this model, not to mention what would
> happen if we later want to emulate some other type of device -- x86
> already has at least one non-irqchip emulated device (i8254).

I have no particular objection to the device control API per se, but
I have two objections to using it as the primary interface to the XICS
emulation.

First, I dislike the magical side-effect where creating a device of a
particular type (e.g. MPIC or XICS) automatically attaches it to the
interrupt lines of the vcpus.  I prefer an explicit request to do
in-kernel interrupt control.  Further, the magic means that you can
only have one instance of the device, whereas you might want to model
the interrupt controller architecture as several devices.  You could
do that using several device types, but then the interconnections
between them would also be magic.

Secondly, it means that we are completely abandoning any attempt to
define an abstract or generic interface to in-kernel interrupt
controller emulations.  Each device will have its own unique set of
attribute groups and its own unique userspace code to drive it, with
no commonality between them.

> >Part of the reason I went with the API that I did is that that was
> >what was agreed on at KVM Forum (as far as I can tell, not having been
> >at the meeting).  Your device API seems to be quite different to that.
> 
> I wasn't there either.  It's fine to have discussions at such
> events, but it should not preclude others from having input, or
> better ideas from being considered afterward.

Sure - I was just trying to fit in with the expressed wish of the
maintainer.

> >> >The XICS emulation supports up to 1048560 interrupt sources.
> >> >Interrupt source numbers below 16 are reserved; 0 is used to
> >mean no
> >> >interrupt and 2 is used for IPIs.  Internally these are
> >represented in
> >> >blocks of 1024, called ICS (interrupt controller source)
> >entities, but
> >> >that is not visible to userspace.
> >> >
> >> >Two other new ioctls allow userspace to control the interrupt
> >> >sources.  The KVM_IRQCHIP_SET_SOURCES ioctl sets the priority,
> >> >destination cpu, level/edge sensitivity and pending state of a
> >range
> >> >of interrupt sources, creating them if they don't already
> >exist.  The
> >> >KVM_IRQCHIP_GET_SOURCES ioctl returns that information for a
> >range of
> >> >interrupt sources (they are required to already exist).
> >>
> >> Why is it userspace's job to control these?  If you use
> >KVM_IRQ_PENDING
> >
> >These are primarily there to support live migration.  For live
> >migration, userspace needs to be able to extract the entire state of
> >the virtual machine from the old guest, and then set the new guest to
> >that exact same state.
> 
> In that case, the state it describes is insufficient for MPIC.

Yes, MPIC has other random stuff in it besides interrupt control, so
that's not surprising.

> >We have live migration working in qemu for
> >pSeries guests with in-kernel XICS emulation using this interface.
> >If you're not doing live migration,
> 
> We don't yet, but would prefer not to assume that it'll never happen.
> 
> >> for interrupt injection, what if there's a race with the user
> >changing
> >> other flags via MMIO?  Maybe this isn't an issue with XICS, but
> >this is
> >> being presented as a generic API.
> >
> >They're not used while the guest is running, as I said, but even if
> >they were, there is appropriate locking in there to handle any races.
> 
> OK, KVM_IRQ_LINE is still used for interrupt injection.  I was
> hoping to avoid going through a standardized interface that forces a
> global interrupt numberspace.

Why?

> How do MSIs get injected?

Just like other interrupts - from the point of view of the interrupt
controller they're edge-triggered interrupt sources.

> BTW, do you have any plans regarding irqfd?

I'm going to look at that next.

> 
> >> >+struct kvm_irq_sources {
> >> >+	__u32 irq;
> >> >+	__u32 nr_irqs;
> >> >+	__u64 __user *irqbuf;
> >> >+};
> >>
> >> Please no pointers in UAPI -- this would require a compat
> >wrapper with
> >> 32-bit user and 64-bit kernel.
> >
> >Hmmm, you're right.  I suppose it will have to be a fixed-size buffer,
> 
> It doesn't need to be a fixed size buffer.  You can still have
> pointers, but they need to be represented as a plain "__u64" with
> users casting the pointer.
> 
> >> >+/* irqbuf entries are laid out like this: */
> >> >+#define KVM_IRQ_SERVER_SHIFT	0
> >> >+#define KVM_IRQ_SERVER_MASK	0xffffffffULL
> >> >+#define KVM_IRQ_PRIORITY_SHIFT	32
> >> >+#define KVM_IRQ_PRIORITY_MASK	0xff
> >> >+#define KVM_IRQ_LEVEL_SENSITIVE	(1ULL << 40)
> >> >+#define KVM_IRQ_MASKED		(1ULL << 41)
> >> >+#define KVM_IRQ_PENDING		(1ULL << 42)
> >>
> >> What does "server" mean?  Do you mean "laid out like this for XICS"?
> >
> >Sorry, I should have made that "destination" rather than "server".
> >You're right, "server" is confusing, but it just means "where the
> >interrupt is sent to be handled".  It has nothing particularly to do
> >with "server" computers.
> 
> Right, I was aware of the IBM terminology here -- just thought it
> wasn't appropriate in a generic interface.

Sure.

> What about interrupt controllers that allow multiple destinations?

The destination can be an identifier for a group of vcpus, or even a
bitmap -- that's why I made it 32 bits.

> More than 256 priorities?  Different "levels" of output (normal,
> critical, machine check)?  Programmable vector numbers?  Active
> high/low control?

There are plenty of bits free in the 64 bits per source that I have
allowed.  We can accommodate those things.  (BTW, I think having more
than 256 priorities would be insane - do you know of any actual
example that does?)

> I just don't think irqchip state can be sanely made generic.
> 
> >> Let's please have a clean separation between what is generic and
> >what is
> >> implementation-specific.
> >
> >I believe that the interface is pretty cleanly generic - the model is
> >a set of interrupt sources and some per-vcpu state, with priorities to
> >decide which interrupts get delivered when.  That describes the basics
> >of just about any SMP-capable interrupt controller, including MPIC.
> 
> The per-vcpu state isn't even part of this AFAICT.  It's an
> XICS-specific ONE_REG -- which is fine, but all that's left of the
> "generic" API is the get/set sources which is an imperfect match to
> our per-IRQ state and it's not clear how an implementation should
> extend it.

Yes, the names of the bitfields in the ICP state word are
XICS-specific, but the concepts are pretty generic - current processor
priority, identifier for interrupt awaiting service, pending IPI
request priority, pending interrupt request priority.

> It would be straightforward to map it to the device control api by
> having XICS declare an attribute group that corresponds to IRQ
> sources (like KVM_DEV_MPIC_GRP_IRQ_ACTIVE) and the attribute data be
> what you've defined for kvm_irq_sources.  Or if you want to preserve
> the "set multiple IRQs at once" approach, you could have an
> attribute that takes exactly a kvm_irq_sources struct, though it
> might be better to have a generic batched attribute set/get ioctl
> (as has been proposed for ONE_REG if the need arises).

Sure, the device control API can probably do just about anything.
We could even use it to create and control vcpus and memory slots, but
that doesn't mean we should.

> >MPIC would still need an extra interface for userspace to save and
> >restore things like the timer registers at live migration time, and
> >for userspace to configure the base address, MPIC version, etc., of
> >course.
> 
> Yes, which pretty much means we need the device control api anyway
> -- or MPIC-specific ioctls, which I wanted to avoid.

I agree that MPIC has enough extra stuff in it, besides interrupt
control, that it probably needs your device control API, at least for
that extra stuff.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux