On 02/19/2013 06:41:11 PM, Paul Mackerras wrote:
On Mon, Feb 18, 2013 at 04:43:27PM -0600, Scott Wood wrote:
> On 02/15/2013 10:51:16 PM, Paul Mackerras wrote:
> >The KVM_CREATE_IRQCHIP_ARGS ioctl says that you want emulation of a
> >specific interrupt controller architecture connected to the vcpus'
> >external interrupt inputs. In that sense it's explicit, compared
to a
> >generic "create device" ioctl that could be for any device.
>
> Hooking up to the CPU's interrupt lines is implicit in creating an
> MPIC (and I'm fine with changing that), not in creating any device.
> I don't see how it's worse than being implicit in calling
> KVM_CREATE_IRQCHIP_ARGS (which doesn't allow for cascaded irqchips).
First, KVM_CREATE_IRQCHIP_ARGS specifies the overall architecture of
the interrupt control subsystem, so yes it does allow for cascaded
controllers.
Fine, but in that case you're dealing with a new irqchip (or irqarch if
you prefer) type number (or param). If you wanted to take that
approach with cascaded MPICs (I wouldn't), you would create a new
device type number. I don't see the difference here.
Secondly, the difference is that if you see a KVM_CREATE_IRQCHIP_ARGS
call, you know that the vcpus' interrupt inputs will be driven by
kernel code. If you see a KVM_CREATE_DEVICE call, you don't know
that; they might be, or they might not be.
I just don't understand what you mean here. Nobody's suggesting that
we make this assumption as soon as you see a "KVM_CREATE_DEVICE" call
for any random device. It's specifically in the creation of a
KVM_DEV_TYPE_FSL_MPIC_20 or KVM_DEV_TYPE_FSL_MPIC_42 that this
assumption is currently made. I don't see how creation of one of those
specific devices is any different from calling KVM_CREATE_IRQCHIP_ARGS
in terms of the intent that can reasonably be inferred.
> >You're doing a round trip to userspace for every MPIC register
access
> >by the guest? Seriously?
>
> No. Accesses by the guest get handled in the kernel. Accesses in
> QEMU, including MSIs generated by virtio, get forwarded to the
> kernel.
OK, I missed the path where that gets done, then.
ppce500_pci has a memory region for e500-pci-bar0 which is an alias for
the ccsr memory region. The mpic registers are a child of the ccsr
memory region. When hw/kvm/mpic.c in QEMU sees an access to that mpic
memory region, it forwards the access to the kernel via
KVM_DEV_MPIC_GRP_REGISTER.
> >It would be the current task priority. I assume MPIC maintains a
> >16-bit map of the interrupt priorities in service, so that would
need
> >to be added.
>
> We don't maintain such a map in the emulation code. We have a
Oh, so how do you handle EOI of nested interrupts?
We scan the in-service bitmap for pending interrupts, and choose the
one that has the highest priority as the one that the EOI must be
referring to.
Just having a bitmap of priorities would only tell us the priority of
the highest priority in-service interrupt, not which actual interrupt
it is.
How do you know what to reset the CPU priority to in that case?
Once we identify the interrupt that is being EOId as above, we clear
that bit and check again to see if there's a remaining interrupt whose
priority is high enough to prevent a pending interrupt from being
delivered.
> per-CPU bitmap of the actual interrupt sources pending/active, which
> is another attribute that would need to be added in order to support
> migration on MPIC.
Not really, that can be recomputed from the sources easily enough.
I'm skeptical. IPIs at least would be a problem, as would other
multicast interrupts if we allowed that. How would we distinguish an
interrupt that is pending from one that is in-service, just from the
sources?
In any case, it's a bit premature to discuss what we'd need for
migration until QEMU itself can save/restore a normal QEMU openpic.
When that time comes, attributes can be added for whatever extra state
we need (if any) to extend that capability to in-kernel MPICs.
-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html