On Wed, Apr 26, 2023, Paolo Bonzini wrote: > On Wed, Apr 26, 2023 at 10:02 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > This is probably the sub-PR for which I'm more interested in giving > > > the code a closer look, but this is more about understanding the > > > changes than it is about expecting something bad in it. > > > > 100% agree. If you were to scrutinize only one thing for 6.4, the vNMI changes > > are definitely my choice for extra eyeballs. > > Interesting read. The split commits left me wondering _why_ patches > 1-7 were needed for vNMI, but that's a known limitation of losing the > cover letter, and the Link or Message-Id trailers try to amend for > that. > > I have a few comments indeed, most of which are absolutely nits and > can be ignored or fixed as follow-ups. It's my turn to send a "belated > review" patch series, which I'll do for -rc2, but please check if > there are any disagreements. > > First of all, this comment caught my attention: > > + /* > + * Rules for synchronizing int_ctl bits from vmcb02 to vmcb01: > + * > + * V_IRQ, V_IRQ_VECTOR, V_INTR_PRIO_MASK, V_IGN_TPR: If L1 doesn't > + * intercept interrupts, then KVM will use vmcb02's V_IRQ (and related > + * flags) to detect interrupt windows for L1 IRQs (even if L1 uses > + * virtual interrupt masking). Raise KVM_REQ_EVENT to ensure that > + * KVM re-requests an interrupt window if necessary, which implicitly > + * copies this bits from vmcb02 to vmcb01. > + * > + * V_TPR: If L1 doesn't use virtual interrupt masking, then L1's vTPR > + * is stored in vmcb02, but its value doesn't need to be copied from/to > + * vmcb01 because it is copied from/to the virtual APIC's TPR register > + * on each VM entry/exit. > + * > + * V_GIF: If nested vGIF is not used, KVM uses vmcb02's V_GIF for L1's > + * V_GIF. However, GIF is architecturally clear on each VM exit, thus > + * there is no need to copy V_GIF from vmcb02 to vmcb01. > + */ > > "Rules for synchronizing int_ctl bits from vmcb02 to vmcb01" suggests > that this is work done here, and it also misled me into looking at > nested_sync_control_from_vmcb02 (which is instead about vmcb02 -> > vmcb12). +1. I had a similar reaction when I first saw the code, but learned to live with it after a few versions :-) > So what about replacing it with > > * int_ctl bits from vmcb02 have to be synchronized to both vmcb12 > and vmcb01. > * The former is in nested_sync_control_from_vmcb02, invoked on every vmexit, > * while the latter is scattered all over the place: > > and perhaps also call out nested_svm_virtualize_tpr(), sync_lapic_to_cr8() and > sync_cr8_to_lapic in the V_TPR part? > > Another super small thing which is not worth a respin (would have been): > > Subject: [PATCH 05/13] KVM: x86: Raise an event request when processing NMIs > - if an NMI is pending > + iff an NMI is pending > > The "if" suggests that we were missing an event request, while "iff" > suggests that we were doing them unnecessarily. Gah, suprised I didn't catch that, I do love me some "iff". > As an aside, I like the "coding style violation" of commit > 400fee8c9b2d. Because the "limit = 2" case is anti-architectural, it > makes more sense to have it as an "else" rather than as the default. > An alternative could have been: > > unsigned limit = 1; > if (!static_call(kvm_x86_get_nmi_mask)(vcpu) && !vcpu->arch.nmi_injected) > limit = 2; > > but the ugly condition makes this solution worse. > > Next on, commit ab2ee212a57b ("KVM: x86: Save/restore all NMIs when > multiple NMIs are pending"). Here, "allows userspace to restore 255 > pending NMIs" in the commit message is kinda scary, and I thought > about following up with a fixlet to KVM_SET_VCPU_EVENTS: > > + events->nmi.pending = min(vcpu->arch.nmi_pending, 2); > vcpu->arch.nmi_pending = 0; > atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending); > kvm_make_request(KVM_REQ_NMI, vcpu); > > but really this isn't needed because process_nmi() does have > > vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit); > > So in the end this is also fine, just a remark on the commit message. Even restoring 255 NMIs would be fine from KVM's perspective. The guest might not be happy, but that's likely true if there are _any_ spurious NMIs. IIRC, I didn't call out the process_nmi() behavior because having 255 pending virtual NMIs doesn't put KVM at risk anymore than does having 2 pending virtual NMIs. > May be worth an additional comment instead here in > KVM_SET_VCPU_EVENTS, before the atomic_set(). > > On to the actual vNMI patch: > > + /* > + * KVM should never request an NMI window when vNMI is enabled, as KVM > + * allows at most one to-be-injected NMI and one pending NMI, i.e. if > + * two NMIs arrive simultaneously, KVM will inject one and set > + * V_NMI_PENDING for the other. WARN, but continue with the standard > + * single-step approach to try and salvage the pending NMI. > + */ > + WARN_ON_ONCE(is_vnmi_enabled(svm)); > > Understandable, but also scary. :) I am not sure losing a pending NMI > is a big deal. IIRC the "limit = 2" case only matters because Windows > uses an NMI shootdown when rebooting the system and in some cases it > would hang; but in this case we're in a buggy situation. And it means > having to think about how the IRET+single-step method interacts with > vNMI, and what is the meaning of !svm->awaiting_iret_completion > (tested right below) in this buggy case. I'd just "return" here. Heh, Santosh originally had it return and I had the opposite reaction: why bail and *guarantee* problems for the guest, instead of continuing on and *maybe* causing problems for the guest. : Last thought, unless there's something that will obviously break, it's probably : better to WARN and continue than to bail. I.e. do the single-step and hope for : the best. Bailing at this point doesn't seem like it would help. I don't have a super strong preference. As you said, KVM is already in a buggy scenario, though my vote is still to carry on. https://lkml.kernel.org/r/Y9mwz%2FG6%2BG8NSX3%2B%40google.com > And another small nit to conclude - kvm_get_nr_pending_nmis() could be static. > > The only thing that leaves me a bit puzzled is the naming and > rationale of get_vnmi_vmcb_l1(). I'll let you or Santosh clarify that. Ah, I think what happened is that I complained about is_vnmi_enabled() being misleading (https://lore.kernel.org/all/Y9m0q31NBmsnhVGD@xxxxxxxxxx), but instead of renaming the top-level helper, Santosh added an inner helper and here we are. Re-reading what I wrote, and looking at the code with fresh eyes, I don't think I agree with past me regarding the name of is_vnmi_enabled(). My biggest objection/confusion with the original code was the comment saying vNMI was "inhibited". Appending "for_l1()" makes the usage in the callers quite confusing. So my vote is to do: static inline bool is_vnmi_enabled(struct vcpu_svm *svm) { if (!vnmi) return false; if (is_guest_mode(&svm->vcpu)) return false; return !!(svm->vmcb01.ptr->control.int_ctl & V_NMI_ENABLE_MASK); } --- diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index f44751dd8d5d..5b604565d4b3 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -556,25 +556,15 @@ static inline bool is_x2apic_msrpm_offset(u32 offset) (msr < (APIC_BASE_MSR + 0x100)); } -static inline struct vmcb *get_vnmi_vmcb_l1(struct vcpu_svm *svm) +static inline bool is_vnmi_enabled(struct vcpu_svm *svm) { if (!vnmi) - return NULL; + return false; if (is_guest_mode(&svm->vcpu)) - return NULL; - else - return svm->vmcb01.ptr; -} - -static inline bool is_vnmi_enabled(struct vcpu_svm *svm) -{ - struct vmcb *vmcb = get_vnmi_vmcb_l1(svm); - - if (vmcb) - return !!(vmcb->control.int_ctl & V_NMI_ENABLE_MASK); - else return false; + + return !!(svm->vmcb01.ptr->control.int_ctl & V_NMI_ENABLE_MASK); } /* svm.c */