On 2012-10-26 13:39, Benjamin Herrenschmidt wrote: > On Fri, 2012-10-26 at 12:17 +0100, Peter Maydell wrote: >>> Well, that's the thing, I haven't managed to figure that out so far, >> it >>> looks very x86-specific to me. To begin with there's no such thing >> as a >>> "GSI" in our world. >> >> This was roughly the feeling I had looking at these APIs. There >> might be some underlying generic concept but there is a definite >> tendency for the surface representation to use x86 specific >> terminology to the extent that you can't tell whether an API >> is x86 specific or merely apparently so... > > Right. Which is why I'm sure I'm actually missing something there :-) > And I'm hoping Paolo and Jan will help shed some light. > > It might help if somebody could explain a bit more what a GSI is in x86 > land and how it relates to the various APICs, along with what exactly > they mean by "routing" , ie. what are the different elements that get > associated. Basically, if somebody could describe how the x86 stuff > works, that might help. > > From my view of things, we have various "sources" of interrupts. On my > list are emulated device LSIs, emulated device MSIs, both in qemu, then > vhost and finally pass-through, I suppose on some platforms IPIs come in > as well though. Those "sources" need, one way or another, to hit a > source controller which will then itself, in a platform specific way, > shoot the interrupt to a presentation controller. > > The routing between source and presentation controllers is fairly > platform specific as far as I can tell even within a given CPU family. > Ie the way an OpenPIC (aka MPIC, used on macs) does it is different than > the way the XICS system does it on pseries, and is different from most > embedded stuff (which typically doesn't have that source/presentation > distinction but just cascaded dumber PICs). The amount of > configurability, the type of configuration information etc... that > governs such a layout is also very specific to the platform and the type > of interrupt controller system used on it. But we are just talking about sending messages from A to B or soldering an input to an output pin. That's pretty generic. Give each output event a virtual IRQ number and define where its output "line" should be linked to (input pin of target controller). All what will be specific are the IDs of those controllers. Of course, all that provided you do their emulation in kernel space. For x86, that even makes sense when the IRQ sources are in user space as the guest may still have to interact during IRQ delivery with IOAPIC, thus we save some costly heavy-weight exits when putting it in the kernel. > > Remains the "routing" between source of "events" and actual "inputs" to > a source controller. > > This too doesn't seem totally obvious to generalize. For example an > embedded platform with a bunch of cascaded dumb interrupt controllers > doesn't have a concept of a flat number space in HW, an interrupt > "input" to be identified properly, needs to identify the controller and > the interrupt within that controller. However, within KVM/qemu, it's > pretty easy to assign to each controller a number and by collating the > two, get some kind of flat space, though it's not arbitrary and the > routing is thus fairly constrained if not totally fixed. IRQ routing entry: - virq number ("gsi") - type (controller ID, MSI, whatever you like) - some flags (to extend it) - type-specific data (MSI message, controller input pin, etc.) And there can be multiple entries with the same virq, thus you can deliver to multiple targets. I bet you can model quite a lot of your platform specific routing this way. I'm not saying our generic code will work out of the box, but at least the interfaces and concepts are there. > > In the pseries case, the global number is split in two bit fields, the > BUID identifying the specific source controller and the source within > that controller. Here too it's fairly fixed though. So the ioctl we use > to create a source controller in the kernel takes the BUID as an > argument, and from there the kernel will "find" the right source > controller based solely on the interrupt number. > > So basically on one side we have a global interrupt number that > identifies an "input", I assume that's what x86 calls a GSI ? Right. The virtual IRQ numbers we call "GSI" is partially occupied by the actual x86-GSIs (0..n, with n=23 so far), directed to the IOAPIC and PIC there, and then followed by IRQs that are mapped on MSI messages. But that's just how we _use_ it on x86, not how it has to work for other archs. > > Remains how to associate the various sources of interrupts to that > 'global number'... and that is fairly specific to each source type isn't > it ? > > In our current powerpc code, the emulated devices toggle the qirq which > ends up shooting an ioctl to set/reset or "message" (for MSIs) the > corresponding global interrupt. The mapping is established entirely > within qemu, we just tell the kernel to trigger a given interrupt. > > We haven't really sorted vhost out yet so I'm not sure how that will > work out but the idea would be to have an ioctl to associate an eventfd > or whatever vhost uses as interrupt "outputs" with a global interrupt > number. KVM_IRQFD is already there. It associates an irqfd file descriptor with a virtual IRQ. Once that triggers, the IRQ routing table is used to define the actual interrupt type and destination chip to use, see above. > > For pass-through, currently our VFIO is dumb, interrupts get to qemu > which then shoots them back to the kernel using the standard qirq stuff > used by emulated devices. Here I suppose we would want something similar > to vhost to associate the VFIO irq fd with a "global number". > > Is that what the existing ioctl's provide ? Their semantics aren't > totally obvious to me. Provided you want to trigger a MSI message, you first need to register it via kvm_irqchip_add_msi_route (will trigger KVM_SET_GSI_ROUTING). That will give you a virtual IRQ number which can be associate with an irqfd file descriptor as explained above (KVM_IRQFD). But you may also create a different kind of routing table entry if MSI is not all you need to inject via irqfd. Could be a plain IRQ line as well, routed to a specific in-kernel IRQ controller model. > > Note that for pass-through at least, and possibly for vhost, we'd like > to actually totally bypass the irqfd & eventfd stuff for performance > reasons. At least for VFIO, if we are going to get the max performance > out of it, we need to take all generic code out of the picture. IE. If > the interrupts are routed to the physical CPU where the guest is > running, we want to be able to catch and distribute the interrupts to > the guest entirely within guest context, ie, with KVM arch specific low > level code that runs in "real mode" (ie MMU off) without context > switching the MMU back to the host, which on POWER is fairly costly. > > That means that at least the association between a guest global > interrupt number and a host global interrupt number for pass-through > will be something that goes entirely through arch specific code path. We > might still be able to use generic APIs to establish it if they are > suitable though. The same will happen on x86: direct injection to a target VCPU. Maybe again a topic for our IRQ routing table, just with specialized target types. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html