Re: [PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller

Scott Wood <scottwood@xxxxxxxxxxxxx> · Tue, 19 Feb 2013 19:01:40 -0600

On 02/19/2013 06:41:11 PM, Paul Mackerras wrote:
On Mon, Feb 18, 2013 at 04:43:27PM -0600, Scott Wood wrote:
> On 02/15/2013 10:51:16 PM, Paul Mackerras wrote:
> >The KVM_CREATE_IRQCHIP_ARGS ioctl says that you want emulation of a
> >specific interrupt controller architecture connected to the vcpus'
> >external interrupt inputs.  In that sense it's explicit, compared  
to a
> >generic "create device" ioctl that could be for any device.
>
> Hooking up to the CPU's interrupt lines is implicit in creating an
> MPIC (and I'm fine with changing that), not in creating any device.
> I don't see how it's worse than being implicit in calling
> KVM_CREATE_IRQCHIP_ARGS (which doesn't allow for cascaded irqchips).

First, KVM_CREATE_IRQCHIP_ARGS specifies the overall architecture of
the interrupt control subsystem, so yes it does allow for cascaded
controllers.

Fine, but in that case you're dealing with a new irqchip (or irqarch if  
you prefer) type number (or param).  If you wanted to take that  
approach with cascaded MPICs (I wouldn't), you would create a new  
device type number.  I don't see the difference here.

Secondly, the difference is that if you see a KVM_CREATE_IRQCHIP_ARGS
call, you know that the vcpus' interrupt inputs will be driven by
kernel code.  If you see a KVM_CREATE_DEVICE call, you don't know
that; they might be, or they might not be.

I just don't understand what you mean here.  Nobody's suggesting that  
we make this assumption as soon as you see a "KVM_CREATE_DEVICE" call  
for any random device.  It's specifically in the creation of a  
KVM_DEV_TYPE_FSL_MPIC_20 or KVM_DEV_TYPE_FSL_MPIC_42 that this  
assumption is currently made.  I don't see how creation of one of those  
specific devices is any different from calling KVM_CREATE_IRQCHIP_ARGS  
in terms of the intent that can reasonably be inferred.

> >You're doing a round trip to userspace for every MPIC register  
access
> >by the guest?  Seriously?
>
> No.  Accesses by the guest get handled in the kernel.  Accesses in
> QEMU, including MSIs generated by virtio, get forwarded to the
> kernel.

OK, I missed the path where that gets done, then.

ppce500_pci has a memory region for e500-pci-bar0 which is an alias for  
the ccsr memory region.  The mpic registers are a child of the ccsr  
memory region.  When hw/kvm/mpic.c in QEMU sees an access to that mpic  
memory region, it forwards the access to the kernel via  
KVM_DEV_MPIC_GRP_REGISTER.

> >It would be the current task priority.  I assume MPIC maintains a
> >16-bit map of the interrupt priorities in service, so that would  
need
> >to be added.
>
> We don't maintain such a map in the emulation code.  We have a

Oh, so how do you handle EOI of nested interrupts?

We scan the in-service bitmap for pending interrupts, and choose the  
one that has the highest priority as the one that the EOI must be  
referring to.

Just having a bitmap of priorities would only tell us the priority of  
the highest priority in-service interrupt, not which actual interrupt  
it is.

 How do you know what to reset the CPU priority to in that case?

Once we identify the interrupt that is being EOId as above, we clear  
that bit and check again to see if there's a remaining interrupt whose  
priority is high enough to prevent a pending interrupt from being  
delivered.

> per-CPU bitmap of the actual interrupt sources pending/active, which
> is another attribute that would need to be added in order to support
> migration on MPIC.

Not really, that can be recomputed from the sources easily enough.

I'm skeptical.  IPIs at least would be a problem, as would other  
multicast interrupts if we allowed that.  How would we distinguish an  
interrupt that is pending from one that is in-service, just from the  
sources?

In any case, it's a bit premature to discuss what we'd need for  
migration until QEMU itself can save/restore a normal QEMU openpic.   
When that time comes, attributes can be added for whatever extra state  
we need (if any) to extend that capability to in-kernel MPICs.

-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html