On Fri, Sep 22, 2023 at 2:02 PM Mingwei Zhang <mizhang@xxxxxxxxxx> wrote: > > On Fri, Sep 22, 2023, Mingwei Zhang wrote: > > On Fri, Sep 22, 2023 at 1:34 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > On Fri, Sep 22, 2023, Mingwei Zhang wrote: > > > > On Fri, Sep 22, 2023 at 12:21 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > > > > > On Fri, Sep 22, 2023, Jim Mattson wrote: > > > > > > On Fri, Sep 22, 2023 at 11:46 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > On Fri, Sep 01, 2023, Jim Mattson wrote: > > > > > > > > When the irq_work callback, kvm_pmi_trigger_fn(), is invoked during a > > > > > > > > VM-exit that also invokes __kvm_perf_overflow() as a result of > > > > > > > > instruction emulation, kvm_pmu_deliver_pmi() will be called twice > > > > > > > > before the next VM-entry. > > > > > > > > > > > > > > > > That shouldn't be a problem. The local APIC is supposed to > > > > > > > > automatically set the mask flag in LVTPC when it handles a PMI, so the > > > > > > > > second PMI should be inhibited. However, KVM's local APIC emulation > > > > > > > > fails to set the mask flag in LVTPC when it handles a PMI, so two PMIs > > > > > > > > are delivered via the local APIC. In the common case, where LVTPC is > > > > > > > > configured to deliver an NMI, the first NMI is vectored through the > > > > > > > > guest IDT, and the second one is held pending. When the NMI handler > > > > > > > > returns, the second NMI is vectored through the IDT. For Linux guests, > > > > > > > > this results in the "dazed and confused" spurious NMI message. > > > > > > > > > > > > > > > > Though the obvious fix is to set the mask flag in LVTPC when handling > > > > > > > > a PMI, KVM's logic around synthesizing a PMI is unnecessarily > > > > > > > > convoluted. > > > > > > > > > > > > > > To address Like's question about whether not this is necessary, I think we should > > > > > > > rephrase this to explicitly state this is a bug irrespective of the whole LVTPC > > > > > > > masking thing. > > > > > > > > > > > > > > And I think it makes sense to swap the order of the two patches. The LVTPC masking > > > > > > > fix is a clearcut architectural violation. This is a bit more of a grey area, > > > > > > > though still blatantly buggy. > > > > > > > > > > > > The reason I ordered the patches as I did is that when this patch > > > > > > comes first, it actually fixes the problem that was introduced in > > > > > > commit 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring > > > > > > instructions"). If this patch comes second, it's less clear that it > > > > > > fixes a bug, since the other patch renders this one essentially moot. > > > > > > > > > > Yeah, but as Like pointed out, the way the changelog is worded just raises the > > > > > question of why this change is necessary. > > > > > > > > > > I think we should tag them both for stable. They're both bug fixes, regardless > > > > > of the ordering. > > > > > > > > Agree. Both patches are fixing the general potential buggy situation > > > > of multiple PMI injection on one vm entry: one software level defense > > > > (forcing the usage of KVM_REQ_PMI) and one hardware level defense > > > > (preventing PMI injection using mask). > > > > > > > > Although neither patch in this series is fixing the root cause of this > > > > specific double PMI injection bug, I don't see a reason why we cannot > > > > add a "fixes" tag to them, since we may fix it and create it again. > > > > > > > > I am currently working on it and testing my patch. Please give me some > > > > time, I think I could try sending out one version today. Once that is > > > > done, I will combine mine with the existing patch and send it out as a > > > > series. > > > > > > Me confused, what patch? And what does this patch have to do with Jim's series? > > > Unless I've missed something, Jim's patches are good to go with my nits addressed. > > > > Let me step back. > > > > We have the following problem when we run perf inside guest: > > > > [ 1437.487320] Uhhuh. NMI received for unknown reason 20 on CPU 3. > > [ 1437.487330] Dazed and confused, but trying to continue > > > > This means there are more NMIs that guest PMI could understand. So > > there are potentially two approaches to solve the problem: 1) fix the > > PMI injection issue: only one can be injected; 2) fix the code that > > causes the (incorrect) multiple PMI injection. > > > > I am working on the 2nd one. So, the property of the 2nd one is: > > without patches in 1) (Jim's patches), we could still avoid the above > > warning messages. > > > > Thanks. > > -Mingwei > > This is my draft version. If you don't have full-width counter support, this > patch needs be placed on top of this one: > https://lore.kernel.org/all/20230504120042.785651-1-rkagan@xxxxxxxxx/ > > My initial testing on both QEMU and our GCP testing environment shows no > "Uhhuh..." dmesg in guest. > > Please take a look... > > From 47e629269d8b0ff65c242334f068300216cb7f91 Mon Sep 17 00:00:00 2001 > From: Mingwei Zhang <mizhang@xxxxxxxxxx> > Date: Fri, 22 Sep 2023 17:13:55 +0000 > Subject: [PATCH] KVM: x86/pmu: Fix emulated counter increment due to > instruction emulation > > Fix KVM emulated counter increment due to instruction emulation. KVM > pmc->counter is always a snapshot value when counter is running. Therefore, > the value does not represent the actual value of counter. Thus it is > inappropriate to compare it with other counter values. In existing code > KVM directly compares pmc->prev_counter and pmc->counter directly. However, > pmc->prev_counter is a snaphot value assigned from pmc->counter when > counter may still be running. So this comparison logic in > reprogram_counter() will generate incorrect invocations to > __kvm_perf_overflow(in_pmi=false) and generate duplicated PMI injection > requests. > > Fix this issue by adding emulated_counter field and only the do the counter > calculation after we pause > > Change-Id: I2d59e68557fd35f7bbcfe09ea42ad81bd36776b7 > --- > arch/x86/include/asm/kvm_host.h | 1 + > arch/x86/kvm/pmu.c | 15 ++++++++------- > arch/x86/kvm/svm/pmu.c | 1 + > arch/x86/kvm/vmx/pmu_intel.c | 2 ++ > 4 files changed, 12 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 1a4def36d5bb..47bbfbc0aa35 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -494,6 +494,7 @@ struct kvm_pmc { > bool intr; > u64 counter; > u64 prev_counter; > + u64 emulated_counter; > u64 eventsel; > struct perf_event *perf_event; > struct kvm_vcpu *vcpu; > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c > index edb89b51b383..47acf3a2b077 100644 > --- a/arch/x86/kvm/pmu.c > +++ b/arch/x86/kvm/pmu.c > @@ -240,12 +240,13 @@ static void pmc_pause_counter(struct kvm_pmc *pmc) > { > u64 counter = pmc->counter; > > - if (!pmc->perf_event || pmc->is_paused) > - return; > - > /* update counter, reset event value to avoid redundant accumulation */ > - counter += perf_event_pause(pmc->perf_event, true); > - pmc->counter = counter & pmc_bitmask(pmc); > + if (pmc->perf_event && !pmc->is_paused) > + counter += perf_event_pause(pmc->perf_event, true); > + > + pmc->prev_counter = counter & pmc_bitmask(pmc); > + pmc->counter = (counter + pmc->emulated_counter) & pmc_bitmask(pmc); > + pmc->emulated_counter = 0; > pmc->is_paused = true; > } > > @@ -452,6 +453,7 @@ static void reprogram_counter(struct kvm_pmc *pmc) > reprogram_complete: > clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi); > pmc->prev_counter = 0; > + pmc->emulated_counter = 0; > } > > void kvm_pmu_handle_event(struct kvm_vcpu *vcpu) > @@ -725,8 +727,7 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu) > > static void kvm_pmu_incr_counter(struct kvm_pmc *pmc) > { > - pmc->prev_counter = pmc->counter; > - pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc); > + pmc->emulated_counter += 1; > kvm_pmu_request_counter_reprogram(pmc); > } > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > index a25b91ff9aea..b88fab4ae1d7 100644 > --- a/arch/x86/kvm/svm/pmu.c > +++ b/arch/x86/kvm/svm/pmu.c > @@ -243,6 +243,7 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu) > > pmc_stop_counter(pmc); > pmc->counter = pmc->prev_counter = pmc->eventsel = 0; > + pmc->emulated_counter = 0; > } > > pmu->global_ctrl = pmu->global_status = 0; > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c > index 626df5fdf542..d03c4ec7273d 100644 > --- a/arch/x86/kvm/vmx/pmu_intel.c > +++ b/arch/x86/kvm/vmx/pmu_intel.c > @@ -641,6 +641,7 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu) > > pmc_stop_counter(pmc); > pmc->counter = pmc->prev_counter = pmc->eventsel = 0; > + pmc->emulated_counter = 0; > } > > for (i = 0; i < KVM_PMC_MAX_FIXED; i++) { > @@ -648,6 +649,7 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu) > > pmc_stop_counter(pmc); > pmc->counter = pmc->prev_counter = 0; > + pmc->emulated_counter = 0; > } > > pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status = 0; > -- > 2.42.0.515.g380fc7ccd1-goog Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx>