On Wed, Apr 06, 2022, Maciej S. Szmigiero wrote: > On 6.04.2022 21:48, Sean Christopherson wrote: > > On Wed, Apr 06, 2022, Maciej S. Szmigiero wrote: > > > On 6.04.2022 19:10, Sean Christopherson wrote: > > > > On Wed, Apr 06, 2022, Maciej S. Szmigiero wrote: > > > And what if it's L0 that is trying to inject a NMI into L2? > > > In this case is_guest_mode() is true, but the full NMI injection machinery > > > should be used. > > > > Gah, you're right, I got misled by a benign bug in nested_vmx_l1_wants_exit() and > > was thinking that NMIs always exit. The "L1 wants" part should be conditioned on > > NMI exiting being enabled. It's benign because KVM always wants "real" NMIs, and > > so the path is never encountered. > > > > @@ -5980,7 +6005,7 @@ static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu, > > switch ((u16)exit_reason.basic) { > > case EXIT_REASON_EXCEPTION_NMI: > > intr_info = vmx_get_intr_info(vcpu); > > - if (is_nmi(intr_info)) > > + if (is_nmi(intr_info) && nested_cpu_has_nmi_exiting(vmcs12)) > > return true; > > else if (is_page_fault(intr_info)) > > return true; > > > > I guess you mean "benign" when having KVM as L1, since other hypervisors may > let their L2s handle NMIs themselves. No, this one's truly benign. The nVMX exit processing is: if (nested_vmx_l0_wants_exit()) handle in L0 / KVM; if (nested_vmx_l1_wants_exit()) handle in L1 handle in L0 / KVM Since this is for actual hardware NMIs, the first "L0 wants" check always returns true for NMIs, so the fact that KVM screws up L1's wants is a non-issue. > > > It is also incorrect to block L1 -> L2 NMI injection because either L1 > > > or L2 is currently under NMI blocking: the first case is obvious, > > > the second because it's L1 that is supposed to take care of proper NMI > > > blocking for L2 when injecting an NMI there. > > > > Yep, but I don't think there's a bug here. At least not for nVMX. > > I agree this scenario should currently work (including on nSVM) - mentioned > it just as a constraint on solution space. > > > > > > With the code in my previous patch set I planned to use > > > > > exit_during_event_injection() to detect such case, but if we implement > > > > > VMCB12 EVENTINJ parsing we can simply add a flag that the relevant event > > > > > comes from L1, so its normal injection side-effects should be skipped. > > > > > > > > Do we still need a flag based on the above? Honest question... I've been staring > > > > at all this for the better part of an hour and may have lost track of things. > > > > > > If checking just is_guest_mode() is not enough due to reasons I described > > > above then we need to somehow determine in the NMI / IRQ injection handler > > > whether the event to be injected into L2 comes from L0 or L1. > > > For this (assuming we do VMCB12 EVENTINJ parsing) I think we need an extra flag. > > > > Yes :-( And I believe the extra flag would need to be handled by KVM_{G,S}ET_VCPU_EVENTS. > > > > Another option for saving and restoring a VM would be to add it to > KVM_{GET,SET}_NESTED_STATE somewhere (maybe as a part of the saved VMCB12 > control area?). Ooh. What if we keep nested_run_pending=true until the injection completes? Then we don't even need an extra flag because nested_run_pending effectively says that any and all injected events are for L1=>L2. In KVM_GET_NESTED_STATE, shove the to-be-injected event into the normal vmc*12 injection field, and ignore all to-be-injected events in KVM_GET_VCPU_EVENTS if nested_run_pending=true. That should work even for migrating to an older KVM, as keeping nested_run_pending will cause the target to reprocess the event injection as if it were from nested VM-Enter, which it technically is. We could probably get away with completely dropping the intermediate event as the vmc*12 should still have the original event, but that technically could result in architecturally incorrect behavior, e.g. if vectoring up to the point of interception sets A/D bits in the guest.