Re: [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 22, 2023 at 1:34 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Fri, Sep 22, 2023, Mingwei Zhang wrote:
> > On Fri, Sep 22, 2023 at 12:21 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Sep 22, 2023, Jim Mattson wrote:
> > > > On Fri, Sep 22, 2023 at 11:46 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Sep 01, 2023, Jim Mattson wrote:
> > > > > > When the irq_work callback, kvm_pmi_trigger_fn(), is invoked during a
> > > > > > VM-exit that also invokes __kvm_perf_overflow() as a result of
> > > > > > instruction emulation, kvm_pmu_deliver_pmi() will be called twice
> > > > > > before the next VM-entry.
> > > > > >
> > > > > > That shouldn't be a problem. The local APIC is supposed to
> > > > > > automatically set the mask flag in LVTPC when it handles a PMI, so the
> > > > > > second PMI should be inhibited. However, KVM's local APIC emulation
> > > > > > fails to set the mask flag in LVTPC when it handles a PMI, so two PMIs
> > > > > > are delivered via the local APIC. In the common case, where LVTPC is
> > > > > > configured to deliver an NMI, the first NMI is vectored through the
> > > > > > guest IDT, and the second one is held pending. When the NMI handler
> > > > > > returns, the second NMI is vectored through the IDT. For Linux guests,
> > > > > > this results in the "dazed and confused" spurious NMI message.
> > > > > >
> > > > > > Though the obvious fix is to set the mask flag in LVTPC when handling
> > > > > > a PMI, KVM's logic around synthesizing a PMI is unnecessarily
> > > > > > convoluted.
> > > > >
> > > > > To address Like's question about whether not this is necessary, I think we should
> > > > > rephrase this to explicitly state this is a bug irrespective of the whole LVTPC
> > > > > masking thing.
> > > > >
> > > > > And I think it makes sense to swap the order of the two patches.  The LVTPC masking
> > > > > fix is a clearcut architectural violation.  This is a bit more of a grey area,
> > > > > though still blatantly buggy.
> > > >
> > > > The reason I ordered the patches as I did is that when this patch
> > > > comes first, it actually fixes the problem that was introduced in
> > > > commit 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring
> > > > instructions"). If this patch comes second, it's less clear that it
> > > > fixes a bug, since the other patch renders this one essentially moot.
> > >
> > > Yeah, but as Like pointed out, the way the changelog is worded just raises the
> > > question of why this change is necessary.
> > >
> > > I think we should tag them both for stable.  They're both bug fixes, regardless
> > > of the ordering.
> >
> > Agree. Both patches are fixing the general potential buggy situation
> > of multiple PMI injection on one vm entry: one software level defense
> > (forcing the usage of KVM_REQ_PMI) and one hardware level defense
> > (preventing PMI injection using mask).
> >
> > Although neither patch in this series is fixing the root cause of this
> > specific double PMI injection bug, I don't see a reason why we cannot
> > add a "fixes" tag to them, since we may fix it and create it again.
> >
> > I am currently working on it and testing my patch. Please give me some
> > time, I think I could try sending out one version today. Once that is
> > done, I will combine mine with the existing patch and send it out as a
> > series.
>
> Me confused, what patch?  And what does this patch have to do with Jim's series?
> Unless I've missed something, Jim's patches are good to go with my nits addressed.

Let me step back.

We have the following problem when we run perf inside guest:

[ 1437.487320] Uhhuh. NMI received for unknown reason 20 on CPU 3.
[ 1437.487330] Dazed and confused, but trying to continue

This means there are more NMIs that guest PMI could understand. So
there are potentially two approaches to solve the problem: 1) fix the
PMI injection issue: only one can be injected; 2) fix the code that
causes the (incorrect) multiple PMI injection.

I am working on the 2nd one. So, the property of the 2nd one is:
without patches in 1) (Jim's patches), we could still avoid the above
warning messages.

Thanks.
-Mingwei




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux