On Mon, Oct 21, 2024, Bernhard Kauer wrote: > It used a static key to avoid loading the lapic pointer from > the vcpu->arch structure. However, in the common case the load > is from a hot cacheline and the CPU should be able to perfectly > predict it. Thus there is no upside of this premature optimization. > > The downside is that code patching including an IPI to all CPUs > is required whenever the first VM without an lapic is created or > the last is destroyed. > > Signed-off-by: Bernhard Kauer <bk@xxxxxxxxx> > --- > > V1->V2: remove spillover from other patch and fix style > > arch/x86/kvm/lapic.c | 10 ++-------- > arch/x86/kvm/lapic.h | 6 +----- > arch/x86/kvm/x86.c | 6 ------ > 3 files changed, 3 insertions(+), 19 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 2098dc689088..287a43fae041 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -135,8 +135,6 @@ static inline int __apic_test_and_clear_vector(int vec, void *bitmap) > return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); > } > > -__read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu); > -EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu); > > __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ); > __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_sw_disabled, HZ); I'm on the fence, slightly leaning towards removing all three of these static keys. If we remove kvm_has_noapic_vcpu to avoid the text patching, then we should definitely drop apic_sw_disabled, as vCPUs are practically guaranteed to toggle the S/W enable bit, e.g. it starts out '0' at RESET. And if we drop apic_sw_disabled, then keeping apic_hw_disabled seems rather pointless. Removing all three keys is measurable, but the impact is so tiny that I have a hard time believing anyone would notice in practice. To measure, I tweaked KVM to handle CPUID exits in the fastpath and then ran the KVM-Unit-Test CPUID microbenchmark (with some minor modifications). Handling CPUID in the fastpath makes the kvm_lapic_enabled() call in the innermost run loop stick out (that helpers checks all three keys/conditions). for (;;) { /* * Assert that vCPU vs. VM APICv state is consistent. An APICv * update must kick and wait for all vCPUs before toggling the * per-VM state, and responding vCPUs must wait for the update * to complete before servicing KVM_REQ_APICV_UPDATE. */ WARN_ON_ONCE((kvm_vcpu_apicv_activated(vcpu) != kvm_vcpu_apicv_active(vcpu)) && (kvm_get_apic_mode(vcpu) != LAPIC_MODE_DISABLED)); exit_fastpath = kvm_x86_call(vcpu_run)(vcpu, req_immediate_exit); if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST)) break; if (kvm_lapic_enabled(vcpu)) kvm_x86_call(sync_pir_to_irr)(vcpu); if (unlikely(kvm_vcpu_exit_request(vcpu))) { exit_fastpath = EXIT_FASTPATH_EXIT_HANDLED; break; } /* Note, VM-Exits that go down the "slow" path are accounted below. */ ++vcpu->stat.exits; } With a single vCPU pinned to a single pCPU, the average latency for a CPUID exit goes from 1018 => 1027 cycles, plus or minus a few. With 8 vCPUs, no pinning (mostly laziness), the average latency goes from 1034 => 1053. Other flows that check multiple vCPUs, e.g. kvm_irq_delivery_to_apic(), might be more affected? The optimized APIC map should help for common cases, but KVM does still check if APICs are enabled multiple times when delivering interrupts. And that's really my only hesitation: there are checks *everywhere* in KVM. On the other hand, we lose gobs and gobs of cycles with far less thought. E.g. with mitigations on, the latency for a single vCPU jumps all the way to 1600+ cycles. And while the diff stats are quite nice, the relevant code is low maintenance. arch/x86/kvm/lapic.c | 41 ++--------------------------------------- arch/x86/kvm/lapic.h | 19 +++---------------- arch/x86/kvm/x86.c | 4 +--- 3 files changed, 6 insertions(+), 58 deletions(-) Paolo or anyone else... thoughts?