Re: [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sean,

On Fri, Sep 22, 2023 at 3:44 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Fri, Sep 22, 2023, Mingwei Zhang wrote:
> > On Fri, Sep 22, 2023, Mingwei Zhang wrote:
> > My initial testing on both QEMU and our GCP testing environment shows no
> > "Uhhuh..." dmesg in guest.
> >
> > Please take a look...
>
> And now I'm extra confused, I thought the plan was for me to post a proper series
> for the emulated_counter idea[*], get the code base healthy/functional, and then
> build on top, e.g. improve performance and whatnot.
>
> The below is just a stripped down version of that, and doesn't look quite right.
> Specifically, pmc_write_counter() needs to purge emulated_count (my pseudo-patch
> subtly handled that by pausing the counter).
>
> I totally realize I'm moving sloooow, but I pinky swear I'm getting there.  My
> compile-tested-only branch can be found at
>
>   https://github.com/sean-jc/linux x86/pmu_refactor
>
> There's a lot more in there, e.g. it has fixes from Roman and Jim, along with
> some other found-by-inspection cleanups.

Sean, thanks for the work you have done on the thread:
https://lore.kernel.org/all/ZJ9IaskpbIK9q4rt@xxxxxxxxxx

I think the diff you posted helped quite a lot. In fact, Jim also
asked me to try using emulated_counter and I thought that just fixed
the issue. I tried my own version as well as yours. However, neither
could fix this problem at that time. So, Jim took a further look on
the lower level and I was stuck on the performance analysis until
recently I came back and discovered the real fix for this.

Your diff (or I should say your patch) covers lots of things including
the adding of emulated_counter, some refactoring code on pmu reset and
pause-on-wrmsr. In comparison, my code just focuses on the bug fixing
for the duplicate PMI, since that's what I care for production.
Although the code looks somewhat similar, the thought process and
intention are quite different.

Sorry to confuse you. To resolve this issue, I am wondering if adding
you into "Co-developed-by:" would be a valid choice? Or the other way
around like adding me as a co-developer into one patch of your series.

In addition, I have no interest in further refactoring the existing
vPMU code. So for that I will definitely step aside.

Update: it looks like both my patch and Jim's patches (applied
separately) break the kvm-unit-test/pmu with the following error on
SPR:

FAIL: Intel: emulated instruction: instruction counter overflow
FAIL: Intel: full-width writes: emulated instruction: instruction
counter overflow

Not sure whether it is a test error or not.

>
> I dropped the "pause on WRMSR" proposal.  I still don't love the offset approach,
> but I agree that pausing and reprogramming counters on writes could introduce an
> entirely new set of problems.
>
> I'm logging off for the weekend, but I'll pick this back up next (it's at the
> top of my priority list, assuming guest_memfd doesn't somehow catch fire.
>
> [*] https://lore.kernel.org/all/ZJ9IaskpbIK9q4rt@xxxxxxxxxx
>
> > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > index edb89b51b383..47acf3a2b077 100644
> > --- a/arch/x86/kvm/pmu.c
> > +++ b/arch/x86/kvm/pmu.c
> > @@ -240,12 +240,13 @@ static void pmc_pause_counter(struct kvm_pmc *pmc)
> >  {
> >       u64 counter = pmc->counter;
> >
> > -     if (!pmc->perf_event || pmc->is_paused)
> > -             return;
> > -
> >       /* update counter, reset event value to avoid redundant accumulation */
> > -     counter += perf_event_pause(pmc->perf_event, true);
> > -     pmc->counter = counter & pmc_bitmask(pmc);
> > +     if (pmc->perf_event && !pmc->is_paused)
> > +             counter += perf_event_pause(pmc->perf_event, true);
> > +
> > +     pmc->prev_counter = counter & pmc_bitmask(pmc);
>
> Honest question, is it actually correct/necessary to mask the counter at the
> intermediate steps?  Conceptually, the hardware count and the emulated count are
> the same thing, just tracked separately.

It is valid because the counter has stopped, and now pmc->counter
contains the valid value. So, it can move into different variables and
do the comparison. Regarding the intermediate steps, I don't think
this is visible to the guest, no? Otherwise, we may have to use tmp
variables and then make the assignment atomic, although I doubt if
that is needed.

>
> > +     pmc->counter = (counter + pmc->emulated_counter) & pmc_bitmask(pmc);
> > +     pmc->emulated_counter = 0;
> >       pmc->is_paused = true;
> >  }
> >
> > @@ -452,6 +453,7 @@ static void reprogram_counter(struct kvm_pmc *pmc)
> >  reprogram_complete:
> >       clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
> >       pmc->prev_counter = 0;
>
> I don't see any reason to keep kvm_pmc.prev_counter.  reprogram_counter() is the
> only caller of pmc_pause_counter(), and so is effectively the only writer and the
> only reader.  I.e. prev_counter can just be a local variable in reprogram_counter(),
> no?

You are right, I will update in the next version.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux