Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

Jan Kiszka <jan.kiszka@xxxxxx> · Sat, 27 Oct 2012 10:06:45 +0200

On 2012-10-27 00:03, Benjamin Herrenschmidt wrote:
> On Sat, 2012-10-27 at 07:45 +1100, Benjamin Herrenschmidt wrote:
>> On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote:
>>
>>> But we are just talking about sending messages from A to B or soldering
>>> an input to an output pin. That's pretty generic. Give each output event
>>> a virtual IRQ number and define where its output "line" should be linked
>>> to (input pin of target controller). All what will be specific are the
>>> IDs of those controllers.
>>
>> Hrm you seem to be saying something very different from Paolo here.
>> Unless it's just a very very confused terminology.
>>
>> So let's see the powerpc "pseries" case. Things like embedded etc...
>> might be quite different.
> 
> So I had a chat with Anthony who explained to me a bit more about what
> the x86 stuff is about. It's pretty horrible I must say :-)
> 
> So correct me if I'm wrong but you essentially have to differentiate
> between MSI "outputs" and other (GSI) "outputs" due to the fact that
> MSIs in x86 land don't act as normal interrupts going through a source
> controller but instead get shot directly to the target CPU.
> 
> Then you have to establish some kind of "routing" from those GSIs to
> some IO/APIC, and from MSIs to local APICs.

I'm afraid there are still some misconceptions about what is happening
on x86 and what role the bits play that are more generic.

The fact that we can inject MSI messages directly to the target APIC
doesn't affect the need to have IRQ routing support. That is used for
two reason on x86:

 - define the wiring from a classic IRQ line to the various (legacy) IRQ
   controllers we have, namely the IOAPIC and the PIC
 - define the IRQ input that should be generated when an irqfd triggers
   (that's currently just irqfd->MSI associations, but irqfd->irqchip
   may come as well for vfio)

> 
> That's where I think there is a fairly fundamental difference with us.
> 
> So let's cut that problem in two. The GSI bit and the MSI bit. The
> reason is that the way x86 does MSIs seems to be fairly x86 specific, I
> wouldn't be surprised if everybody else did MSIs like we do them, that
> is turn them into normal interrupts (ie, GSIs). But let's discuss that
> below.
> 
> So the GSI bit. We can assume that GSIs in that context are basically
> our "global interrupt number". This would apply to pretty much every
> platform indeed.
> 
> The routing here, if I understand things correctly, consists of
> associating such a global interrupt number with a specific input pin (or
> virtual pin) of a specific source controller (ie, IO APIC).

...or PIC or whatever you have on your platform.

> 
> This would generally make sense in embedded space as well I suppose,
> where you can have multiple or even cascaded interrupt controllers of
> different breeds etc...
> 
> However, in the pseries system, this routing is essentially encoded in
> the interrupt number itself. As I think I explained earlier, the number
> is arbitrarily split in two parts, the top bits indicating the source
> controller and the bottom bits the source within that controller. In
> qemu/kvm we have made an arbitrary split (whose size I don't remember
> precisely) and we currently create only one fairly big source controller
> but we might change that in the future.
> 
> This there is no such thing as needing to "associate" or create routing
> entries here. qemu will directly shoot "GSIs" using an ioctl and our
> code can directly map that to a source controller without any routing
> table of any sort. In fact, adding one would complicate things since
> we'd have a requirement that it's populated 1:1 or thing would get very
> confused indeed so overall, there's no point for us to implement or use
> that API or the "generic" code behind it, it would be pure bloat,
> complication and problems.

OK, that puts the IRQ injection IOCTL on pseries in the same category as
KVM_SIGNAL_MSI on x86. Both require no routing as the target address is
fully encoded and not otherwise dynamically remapped in the kernel.

> 
> However, making  that code more generic might make sense for other
> platforms (including other powerpc platforms such as embedded) where
> multiple interrupt controllers may exist though here too, it's probably
> going to be fairly common that the GSI numbers are essentially be a bit
> field split with entire ranges assigned to a given PIC. We don't have to
> emulate x86+ACPI ability to individually remap interrupts.

In KVM, a "GSI" is an index to its central IRQ routing table where the
target IRQ controller or MSI message is defined. Other interpretations
of the number passed down the IRQ injection IOCTL are of course possible
(like you do on pseries), but then it becomes arch-specific.

> 
> The case if MSIs now. My understanding from what Anthony says is that
> your MSIs essentially bypass the IO APIC and route directly to the local
> APIC, which is equivalent to our presentation controller. You thus need
> specific APIs to associate an MSI (which isn't a GSI) to as specific
> local APIC.

First part is right, second part not. We only need special APIs on x86
for the legacy PCI assignment mess.

For MSI delivery, we now have KVM_SIGNAL_MSI, and - as indicated above -
that bypasses any routing. Only if your MSI event is delivered to the
kernel via a virtual IRQ line (means irqfd today), you need a routing
table entry.

> 
> We have no such need at all. Our MSIs are decoded by the PCI host bridge
> and directly turned into "normal" interrupt. In fact, in HW, our bridges
> contain a special source controller that *is* essentially the thing that
> gets hit by MSIs. 
> 
> So our MSIs are just normal interrupts in the global space. Their
> numbers are assigned by qemu, the kernel never knows about them. When an
> emulated device triggers an MSI that turns into a normal "trigger global
> interrupt X" ioctl to the kernel. The only "knowledge" the kernel
> emulation gets along the way is an argument to the ioctl that indicates
> whether this is a level set, level reset, or edge type action (MSIs are
> edge obviously) which dictates how the delivery state machine will work
> (one shot vs. continuous until cleared).
> 
> So qemu assigns interrupt numbers to MSIs and there's never any routing
> to establish at the kernel level. That also means that the current API
> that has tendrils all the way into devices in qemu for "getting the virq
> for a given MSI" is totally unsuitable for us.

Just like it is for x86 when the MSI event is generated in userspace -
and that's why we avoid it in that case.

> In fact we don't need a
> different API for KVM vs. full emulation. Everything in qemu side is the
> same, until the qirq gets actually delivered in which case with KVM
> we'll shoot an ioctl rather than emulating the source controller.
> 
> So the only APIs we need as these:
> 
>  - Create the IRQ chips themselves
>  - Shoot an interrupt
>  - Save and Restore of individual source state for migration
> 
> (The content of the state is very specific to a given IRQ chip
> implementation).
> 
> I fail to see how we can shoe horn any of that in generic code, it
> doesn't fit the model you currently have at all and making it do so
> would add bloat and complexity without any benefit. IE. We wouldn't
> "share" code, we would "add" code not otherwise useful.

I agree that you have no logical need for IRQ routing. Still, if you
want to use irqfd without reimplementing it for pseries, you will have
to populate a GSI routing table. That's because irqfd injects a "GSI"
(via kvm_set_irq), and that is dispatched according to the routing table.

So you need to define for the GSI number of an irqfd which pseries
global IRQ number should be generated. There is a bit of refactoring
needed in irq_comm.c to pull out remaining x86 bits and place some arch
callbacks, e.g. in setup_routing_entry so that you can hook up your
specific IRQ handler.

Options for configuring IRQ chips from userspace were discussed already.

> 
> The ARM situation might be different (and the powerpc situation for
> other platforms such as mac99 and embedded) in that there might be some
> value in having that GSI -> PIC input mapping, but here too I tend to
> doubt it. We are probably better off starting with a cleaner slate
> without the gross x86 baggage and use a unified flat number space in the
> kernel, leaving all the complication of who is connected to whome to
> qemu.

Because the specialties of pseries have not much use for IRQ routing,
you shouldn't derive that it is useless for all other archs. The
interface we have are generic enough, and there is no excuse to ignore
them without actually having tried them on ARM or embedded Power.

Jan

Attachment:
signature.asc

Description: OpenPGP digital signature