On Wed, Apr 26, 2023 at 10:02 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > This is probably the sub-PR for which I'm more interested in giving > > the code a closer look, but this is more about understanding the > > changes than it is about expecting something bad in it. > > 100% agree. If you were to scrutinize only one thing for 6.4, the vNMI changes > are definitely my choice for extra eyeballs. Interesting read. The split commits left me wondering _why_ patches 1-7 were needed for vNMI, but that's a known limitation of losing the cover letter, and the Link or Message-Id trailers try to amend for that. I have a few comments indeed, most of which are absolutely nits and can be ignored or fixed as follow-ups. It's my turn to send a "belated review" patch series, which I'll do for -rc2, but please check if there are any disagreements. First of all, this comment caught my attention: + /* + * Rules for synchronizing int_ctl bits from vmcb02 to vmcb01: + * + * V_IRQ, V_IRQ_VECTOR, V_INTR_PRIO_MASK, V_IGN_TPR: If L1 doesn't + * intercept interrupts, then KVM will use vmcb02's V_IRQ (and related + * flags) to detect interrupt windows for L1 IRQs (even if L1 uses + * virtual interrupt masking). Raise KVM_REQ_EVENT to ensure that + * KVM re-requests an interrupt window if necessary, which implicitly + * copies this bits from vmcb02 to vmcb01. + * + * V_TPR: If L1 doesn't use virtual interrupt masking, then L1's vTPR + * is stored in vmcb02, but its value doesn't need to be copied from/to + * vmcb01 because it is copied from/to the virtual APIC's TPR register + * on each VM entry/exit. + * + * V_GIF: If nested vGIF is not used, KVM uses vmcb02's V_GIF for L1's + * V_GIF. However, GIF is architecturally clear on each VM exit, thus + * there is no need to copy V_GIF from vmcb02 to vmcb01. + */ "Rules for synchronizing int_ctl bits from vmcb02 to vmcb01" suggests that this is work done here, and it also misled me into looking at nested_sync_control_from_vmcb02 (which is instead about vmcb02 -> vmcb12). So what about replacing it with * int_ctl bits from vmcb02 have to be synchronized to both vmcb12 and vmcb01. * The former is in nested_sync_control_from_vmcb02, invoked on every vmexit, * while the latter is scattered all over the place: and perhaps also call out nested_svm_virtualize_tpr(), sync_lapic_to_cr8() and sync_cr8_to_lapic in the V_TPR part? Another super small thing which is not worth a respin (would have been): Subject: [PATCH 05/13] KVM: x86: Raise an event request when processing NMIs - if an NMI is pending + iff an NMI is pending The "if" suggests that we were missing an event request, while "iff" suggests that we were doing them unnecessarily. As an aside, I like the "coding style violation" of commit 400fee8c9b2d. Because the "limit = 2" case is anti-architectural, it makes more sense to have it as an "else" rather than as the default. An alternative could have been: unsigned limit = 1; if (!static_call(kvm_x86_get_nmi_mask)(vcpu) && !vcpu->arch.nmi_injected) limit = 2; but the ugly condition makes this solution worse. Next on, commit ab2ee212a57b ("KVM: x86: Save/restore all NMIs when multiple NMIs are pending"). Here, "allows userspace to restore 255 pending NMIs" in the commit message is kinda scary, and I thought about following up with a fixlet to KVM_SET_VCPU_EVENTS: + events->nmi.pending = min(vcpu->arch.nmi_pending, 2); vcpu->arch.nmi_pending = 0; atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending); kvm_make_request(KVM_REQ_NMI, vcpu); but really this isn't needed because process_nmi() does have vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit); So in the end this is also fine, just a remark on the commit message. May be worth an additional comment instead here in KVM_SET_VCPU_EVENTS, before the atomic_set(). On to the actual vNMI patch: + /* + * KVM should never request an NMI window when vNMI is enabled, as KVM + * allows at most one to-be-injected NMI and one pending NMI, i.e. if + * two NMIs arrive simultaneously, KVM will inject one and set + * V_NMI_PENDING for the other. WARN, but continue with the standard + * single-step approach to try and salvage the pending NMI. + */ + WARN_ON_ONCE(is_vnmi_enabled(svm)); Understandable, but also scary. :) I am not sure losing a pending NMI is a big deal. IIRC the "limit = 2" case only matters because Windows uses an NMI shootdown when rebooting the system and in some cases it would hang; but in this case we're in a buggy situation. And it means having to think about how the IRET+single-step method interacts with vNMI, and what is the meaning of !svm->awaiting_iret_completion (tested right below) in this buggy case. I'd just "return" here. And another small nit to conclude - kvm_get_nr_pending_nmis() could be static. The only thing that leaves me a bit puzzled is the naming and rationale of get_vnmi_vmcb_l1(). I'll let you or Santosh clarify that. Thanks, Paolo