Re: [PATCH] target/i386: Support up to 32768 CPUs without IRQ remapping

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Thu, 8 Oct 2020 09:53:47 +0200

On 08/10/20 09:29, David Woodhouse wrote:
> On Thu, 2020-10-08 at 08:56 +0200, Paolo Bonzini wrote:
>> On 05/10/20 16:18, David Woodhouse wrote:
>>> +        if (kvm_irqchip_is_split()) {
>>> +            ret |= 1U << KVM_FEATURE_MSI_EXT_DEST_ID;
>>> +        }
>>
>> IIUC this is because in-kernel IOAPIC still doesn't work; and when it
>> does, KVM will advertise the feature itself so no other QEMU changes
>> will be needed.
> 
> More the MSI handling than the IOAPIC. I haven't actually worked out
> *what* handles cycles to addresses in the 0xFEExxxxx range for the in-
> kernel irqchip and turns them into interrupts (after putting them
> through interrupt remapping, if/when the kernel learns to do that).

That's easy: it's QEMU. :)  See kvm_apic_mem_write in hw/i386/kvm/apic.c
(note that this memory region is never used when the CPU accesses
0xFEExxxxx, only when QEMU does.

Conversion from the IOAPIC and MSI formats to struct kvm_lapic_irq is
completely separate in KVM, it is respectively in ioapic_service and
kvm_set_msi_irq.  Both of them prepare a struct kvm_lapic_irq, but
they're two different paths.

> Ideally the IOAPIC would just swizzle the bits in its RTE to create an
> MSI message and pass it on to the same code to be (translated and)
> delivered.
> 
> You'll note my qemu patch didn't touch IOAPIC code at all, because
> qemu's IOAPIC really does just that.

Indeed the nice thing about irqchip=split is that the handling of device
interrupts is entirely confined within QEMU, no matter if they're IOAPIC
or MSI.  And because we had to implement interrupt remapping, the IOAPIC
is effectively using MSIs to deliver its interrupts.

There's still the hack to communicate IOAPIC routes to KVM and have it
set the EOI exit bitmap correctly, though.  The code is in
kvm_scan_ioapic_routes and it uses kvm_set_msi_irq (with irqchip=split
everything is also an MSI within the kernel).  I think you're not
handling that correctly for CPUs >255, so after all we _do_ need some
kernel support.

Paolo

>> I queued this, though of course it has to wait for the corresponding
>> kernel patches to be accepted (or separated into doc and non-KVM
>> parts; we'll see).
> 
> Thanks.
>