On Sat, 2012-10-27 at 07:45 +1100, Benjamin Herrenschmidt wrote: > On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote: > > > But we are just talking about sending messages from A to B or soldering > > an input to an output pin. That's pretty generic. Give each output event > > a virtual IRQ number and define where its output "line" should be linked > > to (input pin of target controller). All what will be specific are the > > IDs of those controllers. > > Hrm you seem to be saying something very different from Paolo here. > Unless it's just a very very confused terminology. > > So let's see the powerpc "pseries" case. Things like embedded etc... > might be quite different. So I had a chat with Anthony who explained to me a bit more about what the x86 stuff is about. It's pretty horrible I must say :-) So correct me if I'm wrong but you essentially have to differentiate between MSI "outputs" and other (GSI) "outputs" due to the fact that MSIs in x86 land don't act as normal interrupts going through a source controller but instead get shot directly to the target CPU. Then you have to establish some kind of "routing" from those GSIs to some IO/APIC, and from MSIs to local APICs. That's where I think there is a fairly fundamental difference with us. So let's cut that problem in two. The GSI bit and the MSI bit. The reason is that the way x86 does MSIs seems to be fairly x86 specific, I wouldn't be surprised if everybody else did MSIs like we do them, that is turn them into normal interrupts (ie, GSIs). But let's discuss that below. So the GSI bit. We can assume that GSIs in that context are basically our "global interrupt number". This would apply to pretty much every platform indeed. The routing here, if I understand things correctly, consists of associating such a global interrupt number with a specific input pin (or virtual pin) of a specific source controller (ie, IO APIC). This would generally make sense in embedded space as well I suppose, where you can have multiple or even cascaded interrupt controllers of different breeds etc... However, in the pseries system, this routing is essentially encoded in the interrupt number itself. As I think I explained earlier, the number is arbitrarily split in two parts, the top bits indicating the source controller and the bottom bits the source within that controller. In qemu/kvm we have made an arbitrary split (whose size I don't remember precisely) and we currently create only one fairly big source controller but we might change that in the future. This there is no such thing as needing to "associate" or create routing entries here. qemu will directly shoot "GSIs" using an ioctl and our code can directly map that to a source controller without any routing table of any sort. In fact, adding one would complicate things since we'd have a requirement that it's populated 1:1 or thing would get very confused indeed so overall, there's no point for us to implement or use that API or the "generic" code behind it, it would be pure bloat, complication and problems. However, making that code more generic might make sense for other platforms (including other powerpc platforms such as embedded) where multiple interrupt controllers may exist though here too, it's probably going to be fairly common that the GSI numbers are essentially be a bit field split with entire ranges assigned to a given PIC. We don't have to emulate x86+ACPI ability to individually remap interrupts. The case if MSIs now. My understanding from what Anthony says is that your MSIs essentially bypass the IO APIC and route directly to the local APIC, which is equivalent to our presentation controller. You thus need specific APIs to associate an MSI (which isn't a GSI) to as specific local APIC. We have no such need at all. Our MSIs are decoded by the PCI host bridge and directly turned into "normal" interrupt. In fact, in HW, our bridges contain a special source controller that *is* essentially the thing that gets hit by MSIs. So our MSIs are just normal interrupts in the global space. Their numbers are assigned by qemu, the kernel never knows about them. When an emulated device triggers an MSI that turns into a normal "trigger global interrupt X" ioctl to the kernel. The only "knowledge" the kernel emulation gets along the way is an argument to the ioctl that indicates whether this is a level set, level reset, or edge type action (MSIs are edge obviously) which dictates how the delivery state machine will work (one shot vs. continuous until cleared). So qemu assigns interrupt numbers to MSIs and there's never any routing to establish at the kernel level. That also means that the current API that has tendrils all the way into devices in qemu for "getting the virq for a given MSI" is totally unsuitable for us. In fact we don't need a different API for KVM vs. full emulation. Everything in qemu side is the same, until the qirq gets actually delivered in which case with KVM we'll shoot an ioctl rather than emulating the source controller. So the only APIs we need as these: - Create the IRQ chips themselves - Shoot an interrupt - Save and Restore of individual source state for migration (The content of the state is very specific to a given IRQ chip implementation). I fail to see how we can shoe horn any of that in generic code, it doesn't fit the model you currently have at all and making it do so would add bloat and complexity without any benefit. IE. We wouldn't "share" code, we would "add" code not otherwise useful. The ARM situation might be different (and the powerpc situation for other platforms such as mac99 and embedded) in that there might be some value in having that GSI -> PIC input mapping, but here too I tend to doubt it. We are probably better off starting with a cleaner slate without the gross x86 baggage and use a unified flat number space in the kernel, leaving all the complication of who is connected to whome to qemu. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html