> From: Sean Christopherson <seanjc@xxxxxxxxxx> > Sent: Thursday, December 30, 2021 1:26 AM > > On Wed, Dec 29, 2021, Tian, Kevin wrote: > > > From: Tian, Kevin > > > Sent: Wednesday, December 29, 2021 11:35 AM > > > > > > > > Speaking of nested, interception of #NM in > > > vmx_update_exception_bitmap() > > > > is wrong > > > > with respect to nested guests. Until XFD is supported for L2, which I > didn't > > > > see > > > > in this series, #NM should not be intercepted while L2 is running. > > > > > > Can you remind what additional thing is required to support XFD for L2? > > > > ok, at least we require additional work on when to disable write > interception. > > It can be done only when both L0 and L1 make the same decision, just like > > what we did for many other VMCS settings... > > I'm not terribly concerned with exposing XFD to L2 right now, I'm much more > concerned with not breaking nVMX when XFD is exposed to L1. > > > > If only about performance I prefer to the current conservative approach > > > as the first step. As explained earlier, #NM should be rare if the guest > > > doesn't run AMX applications at all. Adding nested into this picture > doesn't > > > make things a lot worser. > > > > All your comments incorporated except this one. As said always trapping > > #NM for L2 is not a big problem, as long as it's rare and don't break > function. > > Usually a relatively static scheme is safer than a dynamic one in case of > > anything overlooked for the initial implementation. 😊 > > I strongly disagree, it's not automagically safer. Either way, we need to > validate > that KVM does the correct thing with respect to vmcs12. E.g. does KVM > correctly > reflect the #NM back to L2 when it's not intercepted by L1? Does it > synthesize a > nested VM-Exit when it is intercepted by L1? The testing required doesn't > magically > go away. > > As posted, there is zero chance that the patches correctly handling #NM in L2 > because nested_vmx_l0_wants_exit() doesn't prevent an #NM from being > forwarded > to L1. E.g. if L0 (KVM) and L1 both intercept #NM, handle_exception_nm() > will > queue a #NM for injection and then syntehsizea nested VM-Exit, i.e. the #NM > from > L2 will get injected into L1 after the nested exit. Yes, it's incorrect behavior. > > That also means handle_exception_nmi_irqoff() => handle_exception_nm() > is fatally > broken on non-XFD hardware, as it will attempt RDMSR(MSR_IA32_XFD_ERR) > if L1 > intercepts #NM since handle_exception_nmi_irqoff() will run before > __vmx_handle_exit() => nested_vmx_reflect_vmexit() checks whether L0 or > L1 should > handle the exit. Ditto. Thanks for pointing out those facts that we obviously overlooked. So handle_exception_nm() should neither blindly read xfd_err (#NM might be caused by L1 interception on a non-xfd platform) nor blindly queue a #NM exception (triggered in L2) to L1 which intercepts #NM (then expects vm-exit) The former suggests that reading xfd_err should be made conditionally similar to what we did in vmx_update_exception_bitmap(). The latter suggests the actual exception is better postponed until __vmx_handle_exit(). We are looking at this change now. And once #NM handler works correctly to handle interception by either L0 or L1, I'm not sure whether we still want to special case L1 vs. L2 in vmx_update_exception_bitmap(), since in the end L0 wants to intercept #NM to save xfd_err for both L1 and L2 (if exposed with xfd capability by L1). Thanks Kevin