Hi Sean, On Fri, Sep 22, 2023 at 3:44 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Fri, Sep 22, 2023, Mingwei Zhang wrote: > > On Fri, Sep 22, 2023, Mingwei Zhang wrote: > > My initial testing on both QEMU and our GCP testing environment shows no > > "Uhhuh..." dmesg in guest. > > > > Please take a look... > > And now I'm extra confused, I thought the plan was for me to post a proper series > for the emulated_counter idea[*], get the code base healthy/functional, and then > build on top, e.g. improve performance and whatnot. > > The below is just a stripped down version of that, and doesn't look quite right. > Specifically, pmc_write_counter() needs to purge emulated_count (my pseudo-patch > subtly handled that by pausing the counter). > > I totally realize I'm moving sloooow, but I pinky swear I'm getting there. My > compile-tested-only branch can be found at > > https://github.com/sean-jc/linux x86/pmu_refactor > > There's a lot more in there, e.g. it has fixes from Roman and Jim, along with > some other found-by-inspection cleanups. Sean, thanks for the work you have done on the thread: https://lore.kernel.org/all/ZJ9IaskpbIK9q4rt@xxxxxxxxxx I think the diff you posted helped quite a lot. In fact, Jim also asked me to try using emulated_counter and I thought that just fixed the issue. I tried my own version as well as yours. However, neither could fix this problem at that time. So, Jim took a further look on the lower level and I was stuck on the performance analysis until recently I came back and discovered the real fix for this. Your diff (or I should say your patch) covers lots of things including the adding of emulated_counter, some refactoring code on pmu reset and pause-on-wrmsr. In comparison, my code just focuses on the bug fixing for the duplicate PMI, since that's what I care for production. Although the code looks somewhat similar, the thought process and intention are quite different. Sorry to confuse you. To resolve this issue, I am wondering if adding you into "Co-developed-by:" would be a valid choice? Or the other way around like adding me as a co-developer into one patch of your series. In addition, I have no interest in further refactoring the existing vPMU code. So for that I will definitely step aside. Update: it looks like both my patch and Jim's patches (applied separately) break the kvm-unit-test/pmu with the following error on SPR: FAIL: Intel: emulated instruction: instruction counter overflow FAIL: Intel: full-width writes: emulated instruction: instruction counter overflow Not sure whether it is a test error or not. > > I dropped the "pause on WRMSR" proposal. I still don't love the offset approach, > but I agree that pausing and reprogramming counters on writes could introduce an > entirely new set of problems. > > I'm logging off for the weekend, but I'll pick this back up next (it's at the > top of my priority list, assuming guest_memfd doesn't somehow catch fire. > > [*] https://lore.kernel.org/all/ZJ9IaskpbIK9q4rt@xxxxxxxxxx > > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c > > index edb89b51b383..47acf3a2b077 100644 > > --- a/arch/x86/kvm/pmu.c > > +++ b/arch/x86/kvm/pmu.c > > @@ -240,12 +240,13 @@ static void pmc_pause_counter(struct kvm_pmc *pmc) > > { > > u64 counter = pmc->counter; > > > > - if (!pmc->perf_event || pmc->is_paused) > > - return; > > - > > /* update counter, reset event value to avoid redundant accumulation */ > > - counter += perf_event_pause(pmc->perf_event, true); > > - pmc->counter = counter & pmc_bitmask(pmc); > > + if (pmc->perf_event && !pmc->is_paused) > > + counter += perf_event_pause(pmc->perf_event, true); > > + > > + pmc->prev_counter = counter & pmc_bitmask(pmc); > > Honest question, is it actually correct/necessary to mask the counter at the > intermediate steps? Conceptually, the hardware count and the emulated count are > the same thing, just tracked separately. It is valid because the counter has stopped, and now pmc->counter contains the valid value. So, it can move into different variables and do the comparison. Regarding the intermediate steps, I don't think this is visible to the guest, no? Otherwise, we may have to use tmp variables and then make the assignment atomic, although I doubt if that is needed. > > > + pmc->counter = (counter + pmc->emulated_counter) & pmc_bitmask(pmc); > > + pmc->emulated_counter = 0; > > pmc->is_paused = true; > > } > > > > @@ -452,6 +453,7 @@ static void reprogram_counter(struct kvm_pmc *pmc) > > reprogram_complete: > > clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi); > > pmc->prev_counter = 0; > > I don't see any reason to keep kvm_pmc.prev_counter. reprogram_counter() is the > only caller of pmc_pause_counter(), and so is effectively the only writer and the > only reader. I.e. prev_counter can just be a local variable in reprogram_counter(), > no? You are right, I will update in the next version.