Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> · Sat, 27 Oct 2012 09:03:31 +1100

On Sat, 2012-10-27 at 07:45 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote:
> 
> > But we are just talking about sending messages from A to B or soldering
> > an input to an output pin. That's pretty generic. Give each output event
> > a virtual IRQ number and define where its output "line" should be linked
> > to (input pin of target controller). All what will be specific are the
> > IDs of those controllers.
> 
> Hrm you seem to be saying something very different from Paolo here.
> Unless it's just a very very confused terminology.
> 
> So let's see the powerpc "pseries" case. Things like embedded etc...
> might be quite different.

So I had a chat with Anthony who explained to me a bit more about what
the x86 stuff is about. It's pretty horrible I must say :-)

So correct me if I'm wrong but you essentially have to differentiate
between MSI "outputs" and other (GSI) "outputs" due to the fact that
MSIs in x86 land don't act as normal interrupts going through a source
controller but instead get shot directly to the target CPU.

Then you have to establish some kind of "routing" from those GSIs to
some IO/APIC, and from MSIs to local APICs.

That's where I think there is a fairly fundamental difference with us.

So let's cut that problem in two. The GSI bit and the MSI bit. The
reason is that the way x86 does MSIs seems to be fairly x86 specific, I
wouldn't be surprised if everybody else did MSIs like we do them, that
is turn them into normal interrupts (ie, GSIs). But let's discuss that
below.

So the GSI bit. We can assume that GSIs in that context are basically
our "global interrupt number". This would apply to pretty much every
platform indeed.

The routing here, if I understand things correctly, consists of
associating such a global interrupt number with a specific input pin (or
virtual pin) of a specific source controller (ie, IO APIC).

This would generally make sense in embedded space as well I suppose,
where you can have multiple or even cascaded interrupt controllers of
different breeds etc...

However, in the pseries system, this routing is essentially encoded in
the interrupt number itself. As I think I explained earlier, the number
is arbitrarily split in two parts, the top bits indicating the source
controller and the bottom bits the source within that controller. In
qemu/kvm we have made an arbitrary split (whose size I don't remember
precisely) and we currently create only one fairly big source controller
but we might change that in the future.

This there is no such thing as needing to "associate" or create routing
entries here. qemu will directly shoot "GSIs" using an ioctl and our
code can directly map that to a source controller without any routing
table of any sort. In fact, adding one would complicate things since
we'd have a requirement that it's populated 1:1 or thing would get very
confused indeed so overall, there's no point for us to implement or use
that API or the "generic" code behind it, it would be pure bloat,
complication and problems.

However, making  that code more generic might make sense for other
platforms (including other powerpc platforms such as embedded) where
multiple interrupt controllers may exist though here too, it's probably
going to be fairly common that the GSI numbers are essentially be a bit
field split with entire ranges assigned to a given PIC. We don't have to
emulate x86+ACPI ability to individually remap interrupts.

The case if MSIs now. My understanding from what Anthony says is that
your MSIs essentially bypass the IO APIC and route directly to the local
APIC, which is equivalent to our presentation controller. You thus need
specific APIs to associate an MSI (which isn't a GSI) to as specific
local APIC.

We have no such need at all. Our MSIs are decoded by the PCI host bridge
and directly turned into "normal" interrupt. In fact, in HW, our bridges
contain a special source controller that *is* essentially the thing that
gets hit by MSIs. 

So our MSIs are just normal interrupts in the global space. Their
numbers are assigned by qemu, the kernel never knows about them. When an
emulated device triggers an MSI that turns into a normal "trigger global
interrupt X" ioctl to the kernel. The only "knowledge" the kernel
emulation gets along the way is an argument to the ioctl that indicates
whether this is a level set, level reset, or edge type action (MSIs are
edge obviously) which dictates how the delivery state machine will work
(one shot vs. continuous until cleared).

So qemu assigns interrupt numbers to MSIs and there's never any routing
to establish at the kernel level. That also means that the current API
that has tendrils all the way into devices in qemu for "getting the virq
for a given MSI" is totally unsuitable for us. In fact we don't need a
different API for KVM vs. full emulation. Everything in qemu side is the
same, until the qirq gets actually delivered in which case with KVM
we'll shoot an ioctl rather than emulating the source controller.

So the only APIs we need as these:

 - Create the IRQ chips themselves
 - Shoot an interrupt
 - Save and Restore of individual source state for migration

(The content of the state is very specific to a given IRQ chip
implementation).

I fail to see how we can shoe horn any of that in generic code, it
doesn't fit the model you currently have at all and making it do so
would add bloat and complexity without any benefit. IE. We wouldn't
"share" code, we would "add" code not otherwise useful.

The ARM situation might be different (and the powerpc situation for
other platforms such as mac99 and embedded) in that there might be some
value in having that GSI -> PIC input mapping, but here too I tend to
doubt it. We are probably better off starting with a cleaner slate
without the gross x86 baggage and use a unified flat number space in the
kernel, leaving all the complication of who is connected to whome to
qemu.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html