On Fri, Jun 30, 2023 at 9:40 AM Jim Mattson <jmattson@xxxxxxxxxx> wrote: > > On Fri, Jun 30, 2023 at 8:21 AM Roman Kagan <rkagan@xxxxxxxxx> wrote: > > > > On Fri, Jun 30, 2023 at 07:28:29AM -0700, Sean Christopherson wrote: > > > On Fri, Jun 30, 2023, Roman Kagan wrote: > > > > On Thu, Jun 29, 2023 at 05:11:06PM -0700, Sean Christopherson wrote: > > > > > @@ -74,6 +74,14 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc) > > > > > return counter & pmc_bitmask(pmc); > > > > > } > > > > > > > > > > +static inline void pmc_write_counter(struct kvm_pmc *pmc, u64 val) > > > > > +{ > > > > > + if (pmc->perf_event && !pmc->is_paused) > > > > > + perf_event_set_count(pmc->perf_event, val); > > > > > + > > > > > + pmc->counter = val; > > > > > > > > Doesn't this still have the original problem of storing wider value than > > > > allowed? > > > > > > Yes, this was just to fix the counter offset weirdness. My plan is to apply your > > > patch on top. Sorry for not making that clear. > > > > Ah, got it, thanks! > > > > Also I'm now chasing a problem that we occasionally see > > > > [3939579.462832] Uhhuh. NMI received for unknown reason 30 on CPU 43. > > [3939579.462836] Do you have a strange power saving mode enabled? > > [3939579.462836] Dazed and confused, but trying to continue > > > > in the guests when perf is used. These messages disappear when > > 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") is > > reverted. I haven't yet figured out where exactly the culprit is. > > Maybe this is because KVM doesn't virtualize > IA32_DEBUGCTL.Freeze_PerfMon_On_PMI? Never mind. Linux doesn't set IA32_DEBUGCTL.Freeze_PerfMon_On_PMI. > Consider: > > 1. PMC0 overflows, GLOBAL_STATUS[0] is set, and an NMI is delivered. > 2. Before the guest's PMI handler clears GLOBAL_CTRL, PMC1 overflows, > GLOBAL_STATUS[1] is set, and an NMI is queued for delivery after the > next IRET. > 3. The guest's PMI handler clears GLOBAL_CTRL, reads 3 from > GLOBAL_STATUS, writes 3 to GLOBAL_OVF_CTRL, re-enables GLOBAL_CTRL, > and IRETs. > 4. The queued NMI is delivered, but GLOBAL_STATUS is now 0. No one > claims the NMI, so we get the spurious NMI message. > > I don't know why this would require counting the retirement of > emulated instructions. It seems that hardware PMC overflow in the > early part of the guest's PMI handler would also be a problem. > > > Thanks, > > Roman. > > > > > > > > Amazon Development Center Germany GmbH > > Krausenstr. 38 > > 10117 Berlin > > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss > > Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B > > Sitz: Berlin > > Ust-ID: DE 289 237 879 > > > > > >