Re: in-kernel interrupt controller steering

Alexander Graf <agraf@xxxxxxx> · Wed, 6 Mar 2013 15:48:54 +0100

On 06.03.2013, at 15:41, Gleb Natapov wrote:

> On Wed, Mar 06, 2013 at 03:03:53PM +0100, Alexander Graf wrote:
>> 
>> On 06.03.2013, at 14:56, Gleb Natapov wrote:
>> 
>>> On Wed, Mar 06, 2013 at 02:22:15PM +0100, Alexander Graf wrote:
>>>> 
>>>> On 06.03.2013, at 14:14, Gleb Natapov wrote:
>>>> 
>>>>> On Wed, Mar 06, 2013 at 01:20:39PM +0100, Alexander Graf wrote:
>>>>>>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
>>>>>>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
>>>>>> 
>>>>>> Ah, I see. I don't see why it would. The fact that there is a "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller emulation I would consider that a bug.
>>>>>> 
>>>>> For x86 this is the case though. I do not see how it can't be. If
>>>>> LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
>>>>> vector that should be handled as a result of LAPIC emulation.
>>>> 
>>>> So SET_INTERRUPT on a vcpu triggers a line on the LAPIC emulation in that vcpu? For us it directly controls the CPU interrupt pin.
>>>> 
>>> No SET_INTERRUPT on a vcpu tells vcpu to which vector in IDT it needs to
>>> jump immediately. LAPIC is really part of a cpu and we cut it and put into
>>> userspace, so interface between userspace LAPIC emulation is really low
>>> level and has to be synchronous. X86 has two interrupt lines NMI and INTR
>>> and we do not have interface to trigger the later.  KVM_IRQ_LINE works on
>>> GSI lines which do not go into CPU directly. They go either via PIC (which
>>> triggers INTR or APIC LINT0) or via IOAPIC which on real HW communicates
>>> with APICs via bus, but in our emulation just calls APICs directly.
>> 
>> Great :). It's similar for us. SET_INTERRUPT directly asserts the INTR line of the vcpu. There is nothing like an IDT on PPC, so external interrupts simply arrive at a specific vector. That vector can differ for critical or NMI interrupts IIRC, but I'm not sure we implement that right now. If so, it'd be a different line for SET_INTERRUPT.
>> 
>> So in a way, it's the same. And SET_INTERRUPT should work regardless of whether a LAPIC is used or not really. At least it would for us :).
>> 
> Is it possible for some devices to inject interrupt directly and other
> to go through interrupt controller?

It would be racy if both assert + deassert the same line, but I don't see why we should keep anyone from doing it. If user space wants to run such a configuration, it needs to ensure that only one of the 2 is actively used at any given time.

>> KVM_IRQ_LINE is basically an IOAPIC interrupt line assert. That's fine. That ioctl should get an ioapic device handle to work on. Whether we call the IOAPIC PINs GSIs or something different is really just a naming question. I'd probably call it IRQ number :).
> Yes and no. On sane archs we can call it IRQ number (lucky you!), but on
> X86 there is a GSI that can be IRQ2 if it goes through IOAPIC and IRQ0
> if it goes through PIC, so additional entity was invented: irq routing.
> It maps between GSI and irqchips pin. Same GSI may go to more than one
> irqchip. This is why for x86 having irqchip device handle as a parameter
> to KVM_IRQ_LINE does not make sense. It make sense to provide it to irq
> router and this is how it work now except that "device handlers" are
> hard coded.

Then you would create a new "irq router" device that does the multiplexing and can also receive IRQs. You could then directly assert an IOAPIC/PIC line or a multiplexer line. Or am I misunderstanding something?

> 
>> But it's the same idea. The "IOAPIC" would then talk to to in-kernel "LAPIC" style bits (or in case of the MPIC just integrate them inside of itself). That's why by the time we create an "IOAPIC", the "LAPIC"s in the system have to be populated.
> The restriction that LAPIC has to be created before IOAPIC would be a
> bug that need to be fixed on X86. The reason is cpu hotplug. If you have
> to support cpu hotplug you have to be able to create LAPICs after IOAPIC
> and at this point you can create IOAPIC before any LAPICs as well. I
> understand this may not be the case for all architectures right now, but
> something to keep in mind.

Paul, Scott, do you think we can move the "this CPU can receive interrupts from MPIC / XICS" part into an ENABLE_CAP that gets set dynamically? That ENABLE_CAP would allocate the structures in the vcpu and register the vcpu with the interrupt controller pool.

The interrupt controller device would still iterate through all vcpus to find the ones that match so that we support the ENABLE_CAP at any point in time.

> 
>> 
>> So again, I'm failing to see where we think differently :).
>> 
> The difference is very minor really. I still try to justify to myself
> why we need separate ioctl() to announce what irqchip we are going to
> create before creating one (except save QEMU some troubles). The question
> is: is this ioctl can be useful by itself? Seems like unlikely scenario
> that we will allow IOAPIC/PIC emulation in uesrspace while LAPIC is in
> kernel may be such case. QEMU will call it before creating vcpus to
> tell KVM that LAPICs need to be created along with VCPUs, but no
> irqchip will be created.

I don't have a real answer for you yet, but so far the general design mantra of "small, individual pieces that plug together" worked out way better for us than the "have one call that does it all" one. Being explicit simply makes sure that we support more scenarios we don't think of today.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html