On Fri, Sep 22, 2023 at 2:06 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Fri, Sep 22, 2023, Mingwei Zhang wrote: > > On Fri, Sep 22, 2023 at 1:34 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > On Fri, Sep 22, 2023, Mingwei Zhang wrote: > > > > Agree. Both patches are fixing the general potential buggy situation > > > > of multiple PMI injection on one vm entry: one software level defense > > > > (forcing the usage of KVM_REQ_PMI) and one hardware level defense > > > > (preventing PMI injection using mask). > > > > > > > > Although neither patch in this series is fixing the root cause of this > > > > specific double PMI injection bug, I don't see a reason why we cannot > > > > add a "fixes" tag to them, since we may fix it and create it again. > > > > > > > > I am currently working on it and testing my patch. Please give me some > > > > time, I think I could try sending out one version today. Once that is > > > > done, I will combine mine with the existing patch and send it out as a > > > > series. > > > > > > Me confused, what patch? And what does this patch have to do with Jim's series? > > > Unless I've missed something, Jim's patches are good to go with my nits addressed. > > > > Let me step back. > > > > We have the following problem when we run perf inside guest: > > > > [ 1437.487320] Uhhuh. NMI received for unknown reason 20 on CPU 3. > > [ 1437.487330] Dazed and confused, but trying to continue > > > > This means there are more NMIs that guest PMI could understand. So > > there are potentially two approaches to solve the problem: 1) fix the > > PMI injection issue: only one can be injected; 2) fix the code that > > causes the (incorrect) multiple PMI injection. > > No, because the LVTPC masking fix isn't optional, the current KVM behavior is a > clear violation of the SDM. And I'm struggling to think of a sane way to fix the > IRQ work bug, e.g. KVM would have to busy on the work finishing before resuming > the guest, which is rather crazy. > > I'm not saying there isn't more work to be done, nor am I saying that we shouldn't > further harden KVM against double-injection. I'm just truly confused as to what > that has to do with Jim's fixes. > hmm, I will take the "two approaches" back. You are right on that. "two directions" is what I mean. Oh, I think I did not elaborate the full context to you maybe. That might cause confusion and sorry about that. The context of Jim's patches is to fix the multiple PMI injections when using perf, starting from https://lore.kernel.org/all/ZJ7y9DuedQyBb9eU@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ So, regarding the fix, there are multiple layers and they may or may not be logically connected closely, but we are solving the same problem. In fact, I was asking Jim to help me with this specific issue :) So yes, they could be put together and they could be put separately. But I don't see why they _cannot_ be together or cause confusion. So, I would like to put them together in the same context with a cover letter fully describing the details. FYI for reviewers: to reproduce the spurious PMI issue in the guest VM, you need to let KVM emulate some instructions during the runtime, so the function kvm_pmu_incr_counter() will be triggered more. One option is to add a kernel cmdline like "idle=nomwait" to your guest kernel. Regarding the workload in guest vm, please run the perf command specified in https://lore.kernel.org/all/ZKCD30QrE5g9XGIh@xxxxxxxxxx/ Thanks. -Mingwei -Mingwei