+Paolo, I'm pretty sure he still doesn't subscribe to kvm@ :-) On Mon, Dec 09, 2024, Sean Christopherson wrote: > On Mon, Oct 21, 2024, Bernhard Kauer wrote: > > It used a static key to avoid loading the lapic pointer from > > the vcpu->arch structure. However, in the common case the load > > is from a hot cacheline and the CPU should be able to perfectly > > predict it. Thus there is no upside of this premature optimization. > > > > The downside is that code patching including an IPI to all CPUs > > is required whenever the first VM without an lapic is created or > > the last is destroyed. > > > > Signed-off-by: Bernhard Kauer <bk@xxxxxxxxx> > > --- > > > > V1->V2: remove spillover from other patch and fix style > > > > arch/x86/kvm/lapic.c | 10 ++-------- > > arch/x86/kvm/lapic.h | 6 +----- > > arch/x86/kvm/x86.c | 6 ------ > > 3 files changed, 3 insertions(+), 19 deletions(-) > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > index 2098dc689088..287a43fae041 100644 > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -135,8 +135,6 @@ static inline int __apic_test_and_clear_vector(int vec, void *bitmap) > > return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); > > } > > > > -__read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu); > > -EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu); > > > > __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ); > > __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_sw_disabled, HZ); > > I'm on the fence, slightly leaning towards removing all three of these static keys. > > If we remove kvm_has_noapic_vcpu to avoid the text patching, then we should > definitely drop apic_sw_disabled, as vCPUs are practically guaranteed to toggle > the S/W enable bit, e.g. it starts out '0' at RESET. And if we drop apic_sw_disabled, > then keeping apic_hw_disabled seems rather pointless. > > Removing all three keys is measurable, but the impact is so tiny that I have a > hard time believing anyone would notice in practice. > > To measure, I tweaked KVM to handle CPUID exits in the fastpath and then ran the > KVM-Unit-Test CPUID microbenchmark (with some minor modifications). Handling > CPUID in the fastpath makes the kvm_lapic_enabled() call in the innermost run loop > stick out (that helpers checks all three keys/conditions). > > for (;;) { > /* > * Assert that vCPU vs. VM APICv state is consistent. An APICv > * update must kick and wait for all vCPUs before toggling the > * per-VM state, and responding vCPUs must wait for the update > * to complete before servicing KVM_REQ_APICV_UPDATE. > */ > WARN_ON_ONCE((kvm_vcpu_apicv_activated(vcpu) != kvm_vcpu_apicv_active(vcpu)) && > (kvm_get_apic_mode(vcpu) != LAPIC_MODE_DISABLED)); > > exit_fastpath = kvm_x86_call(vcpu_run)(vcpu, > req_immediate_exit); > if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST)) > break; > > if (kvm_lapic_enabled(vcpu)) > kvm_x86_call(sync_pir_to_irr)(vcpu); > > if (unlikely(kvm_vcpu_exit_request(vcpu))) { > exit_fastpath = EXIT_FASTPATH_EXIT_HANDLED; > break; > } > > /* Note, VM-Exits that go down the "slow" path are accounted below. */ > ++vcpu->stat.exits; > } > > With a single vCPU pinned to a single pCPU, the average latency for a CPUID exit > goes from 1018 => 1027 cycles, plus or minus a few. With 8 vCPUs, no pinning > (mostly laziness), the average latency goes from 1034 => 1053. > > Other flows that check multiple vCPUs, e.g. kvm_irq_delivery_to_apic(), might be > more affected? The optimized APIC map should help for common cases, but KVM does > still check if APICs are enabled multiple times when delivering interrupts. And > that's really my only hesitation: there are checks *everywhere* in KVM. > > On the other hand, we lose gobs and gobs of cycles with far less thought. E.g. > with mitigations on, the latency for a single vCPU jumps all the way to 1600+ cycles. > > And while the diff stats are quite nice, the relevant code is low maintenance. > > arch/x86/kvm/lapic.c | 41 ++--------------------------------------- > arch/x86/kvm/lapic.h | 19 +++---------------- > arch/x86/kvm/x86.c | 4 +--- > 3 files changed, 6 insertions(+), 58 deletions(-) > > Paolo or anyone else... thoughts?