On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote: > Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs > for x2APIC. If 32-bit IDs are not enabled, disable the optimized map to > honor x86 architectural behavior if multiple vCPUs shared a physical APIC > ID. As called out in the changelog that added the hack, all CPUs whose > (possibly truncated) APIC ID matches the target are supposed to receive > the IPI. > > KVM intentionally differs from real hardware, because real hardware > (Knights Landing) does just "x2apic_id & 0xff" to decide whether to > accept the interrupt in xAPIC mode and it can deliver one interrupt to > more than one physical destination, e.g. 0x123 to 0x123 and 0x23. > > Applying the hack even when x2APIC is not fully enabled means KVM doesn't > correctly handle scenarios where the guest has aliased xAPIC IDs across > multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any > interrupts. It's extremely unlikely any real world guest aliase APIC IDs, > or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the > lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which > vCPU is "normal". > > Furthermore, the hack is _not_ guaranteed to work! The hack works if and > only if the optimized APIC map is successfully allocated. If the map > allocation fails (unlikely), KVM will fall back to its unoptimized > behavior, which _does_ honor the architectural behavior. > > Pivot on 32-bit x2APIC IDs being enabled as that is required to take > advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't > break existing setups unless they are way, way off in the weeds. > > And an entry in KVM's errata to document the hack. Alternatively, KVM > could provide an actual x2APIC quirk and document the hack that way, but > there's unlikely to ever be a use case for disabling the quirk. Go the > errata route to avoid having to validate a quirk no one cares about. > > Fixes: 5bd5db385b3e ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff") > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > Documentation/virt/kvm/x86/errata.rst | 11 ++++++ > arch/x86/kvm/lapic.c | 50 ++++++++++++++++++++++----- > 2 files changed, 52 insertions(+), 9 deletions(-) > > diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst > index 410e0aa63493..49a05f24747b 100644 > --- a/Documentation/virt/kvm/x86/errata.rst > +++ b/Documentation/virt/kvm/x86/errata.rst > @@ -37,3 +37,14 @@ Nested virtualization features > ------------------------------ > > TBD > + > +x2APIC > +------ > +When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that > +allows sending events to a single vCPU using its x2APIC ID even if the target > +vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI > +on VMs with > 255 vCPUs. A side effect of the quirk is that, if multiple vCPUs > +have the same physical APIC ID, KVM will deliver events targeting that APIC ID > +only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is > +not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs > +matching the target APIC ID receive the interrupt). > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index d537b51295d6..c224b5c7cd92 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -260,10 +260,10 @@ void kvm_recalculate_apic_map(struct kvm *kvm) > kvm_for_each_vcpu(i, vcpu, kvm) { > struct kvm_lapic *apic = vcpu->arch.apic; > struct kvm_lapic **cluster; > + u32 x2apic_id, physical_id; > u16 mask; > u32 ldr; > u8 xapic_id; > - u32 x2apic_id; > > if (!kvm_apic_present(vcpu)) > continue; > @@ -271,16 +271,48 @@ void kvm_recalculate_apic_map(struct kvm *kvm) > xapic_id = kvm_xapic_id(apic); > x2apic_id = kvm_x2apic_id(apic); > > - /* Hotplug hack: see kvm_apic_match_physical_addr(), ... */ > - if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) && > - x2apic_id <= new->max_apic_id) > - new->phys_map[x2apic_id] = apic; > /* > - * ... xAPIC ID of VCPUs with APIC ID > 0xff will wrap-around, > - * prevent them from masking VCPUs with APIC ID <= 0xff. > + * Apply KVM's hotplug hack if userspace has enable 32-bit APIC > + * IDs. Allow sending events to vCPUs by their x2APIC ID even > + * if the target vCPU is in legacy xAPIC mode, and silently > + * ignore aliased xAPIC IDs (the x2APIC ID is truncated to 8 > + * bits, causing IDs > 0xff to wrap and collide). > + * > + * Honor the architectural (and KVM's non-optimized) behavior > + * if userspace has not enabled 32-bit x2APIC IDs. Each APIC > + * is supposed to process messages independently. If multiple > + * vCPUs have the same effective APIC ID, e.g. due to the > + * x2APIC wrap or because the guest manually modified its xAPIC > + * IDs, events targeting that ID are supposed to be recognized > + * by all vCPUs with said ID. > */ > - if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id]) > - new->phys_map[xapic_id] = apic; > + if (kvm->arch.x2apic_format) { > + /* See also kvm_apic_match_physical_addr(). */ > + if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) && > + x2apic_id <= new->max_apic_id) > + new->phys_map[x2apic_id] = apic; > + > + if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id]) > + new->phys_map[xapic_id] = apic; > + } else { > + /* > + * Disable the optimized map if the physical APIC ID is > + * already mapped, i.e. is aliased to multiple vCPUs. > + * The optimized map requires a strict 1:1 mapping > + * between IDs and vCPUs. > + */ > + if (apic_x2apic_mode(apic)) > + physical_id = x2apic_id; > + else > + physical_id = xapic_id; > + > + if (new->phys_map[physical_id]) { > + kvfree(new); > + new = NULL; > + goto out; Why not to use the same KVM_APIC_MODE_XAPIC_FLAT | KVM_APIC_MODE_XAPIC_CLUSTER hack here? Other than that looks correct but I don't know this area well enough to be 100% sure. Best regards, Maxim Levitsky > + } > + new->phys_map[physical_id] = apic; > + } > > if (!kvm_apic_sw_enabled(apic)) > continue;