On Fri, Sep 22, 2023, Mingwei Zhang wrote: > On Fri, Sep 22, 2023, Mingwei Zhang wrote: > My initial testing on both QEMU and our GCP testing environment shows no > "Uhhuh..." dmesg in guest. > > Please take a look... And now I'm extra confused, I thought the plan was for me to post a proper series for the emulated_counter idea[*], get the code base healthy/functional, and then build on top, e.g. improve performance and whatnot. The below is just a stripped down version of that, and doesn't look quite right. Specifically, pmc_write_counter() needs to purge emulated_count (my pseudo-patch subtly handled that by pausing the counter). I totally realize I'm moving sloooow, but I pinky swear I'm getting there. My compile-tested-only branch can be found at https://github.com/sean-jc/linux x86/pmu_refactor There's a lot more in there, e.g. it has fixes from Roman and Jim, along with some other found-by-inspection cleanups. I dropped the "pause on WRMSR" proposal. I still don't love the offset approach, but I agree that pausing and reprogramming counters on writes could introduce an entirely new set of problems. I'm logging off for the weekend, but I'll pick this back up next (it's at the top of my priority list, assuming guest_memfd doesn't somehow catch fire. [*] https://lore.kernel.org/all/ZJ9IaskpbIK9q4rt@xxxxxxxxxx > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c > index edb89b51b383..47acf3a2b077 100644 > --- a/arch/x86/kvm/pmu.c > +++ b/arch/x86/kvm/pmu.c > @@ -240,12 +240,13 @@ static void pmc_pause_counter(struct kvm_pmc *pmc) > { > u64 counter = pmc->counter; > > - if (!pmc->perf_event || pmc->is_paused) > - return; > - > /* update counter, reset event value to avoid redundant accumulation */ > - counter += perf_event_pause(pmc->perf_event, true); > - pmc->counter = counter & pmc_bitmask(pmc); > + if (pmc->perf_event && !pmc->is_paused) > + counter += perf_event_pause(pmc->perf_event, true); > + > + pmc->prev_counter = counter & pmc_bitmask(pmc); Honest question, is it actually correct/necessary to mask the counter at the intermediate steps? Conceptually, the hardware count and the emulated count are the same thing, just tracked separately. > + pmc->counter = (counter + pmc->emulated_counter) & pmc_bitmask(pmc); > + pmc->emulated_counter = 0; > pmc->is_paused = true; > } > > @@ -452,6 +453,7 @@ static void reprogram_counter(struct kvm_pmc *pmc) > reprogram_complete: > clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi); > pmc->prev_counter = 0; I don't see any reason to keep kvm_pmc.prev_counter. reprogram_counter() is the only caller of pmc_pause_counter(), and so is effectively the only writer and the only reader. I.e. prev_counter can just be a local variable in reprogram_counter(), no?