On 1/2/2024 8:57 PM, Sean Christopherson wrote: >> >> Additionally, while the proposed code fixes VMX specific issue, SVM also >> might suffer from similar problem as it also uses it's own >> nested_run_pending variable. >> >> Reported-by: Zheyu Ma <zheyuma97@xxxxxxxxx> >> Closes: https://lore.kernel.org/all/CAMhUBjmXMYsEoVYw_M8hSZjBMHh24i88QYm-RY6HDta5YZ7Wgw@xxxxxxxxxxxxxx > > Fixes: 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM") Thanks ! > >> Signed-off-by: Michal Wilczynski <michal.wilczynski@xxxxxxxxx> >> --- >> arch/x86/kvm/vmx/nested.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c >> index c5ec0ef51ff7..44432e19eea6 100644 >> --- a/arch/x86/kvm/vmx/nested.c >> +++ b/arch/x86/kvm/vmx/nested.c >> @@ -4904,7 +4904,16 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason, >> >> static void nested_vmx_triple_fault(struct kvm_vcpu *vcpu) >> { >> + struct vcpu_vmx *vmx = to_vmx(vcpu); >> + >> kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu); >> + >> + /* In case of a triple fault, cancel the nested reentry. This may occur > > /* > * Multi-line comments should look like this. Blah blah blab blah blah > * blah blah blah blah. > */ Sorry, didn't notice, and checkpatch didn't complain. In other subsystems e.g. networking this is not enforced. I will make sure to remember about this next time. > >> + * when the RSM instruction fails while attempting to restore the state >> + * from SMRAM. >> + */ >> + vmx->nested.nested_run_pending = 0; > > Argh. KVM's handling of SMIs while L2 is active is complete garbage. As explained > by the comment in vmx_enter_smm(), the L2<->SMM transitions should have a completely > custom flow and not piggyback/usurp nested VM-Exit/VM-Entry. > > /* > * TODO: Implement custom flows for forcing the vCPU out/in of L2 on > * SMI and RSM. Using the common VM-Exit + VM-Enter routines is wrong > * SMI and RSM only modify state that is saved and restored via SMRAM. > * E.g. most MSRs are left untouched, but many are modified by VM-Exit > * and VM-Enter, and thus L2's values may be corrupted on SMI+RSM. > */ I noticed this while working on the issue, and I would be very interested to take this task and implement custom flows mentioned. Hope you're fine with this. > As a stop gap, something like this patch is not awful, though I would strongly > prefer to be more precise and not clear it on all triple faults. We've had KVM > bugs where KVM prematurely synthesizes triple fault on an actual nested VM-Enter, > and those would be covered up by this fix. > > But due to nested_run_pending being (unnecessarily) buried in vendor structs, it > might actually be easier to do a cleaner fix. E.g. add yet another flag to track > that a hardware VM-Enter needs to be completed in order to complete instruction > emulation. Sounds like a good idea. I will experiment with that approach. > > And as alluded to above, there's another bug lurking. Events that are *emulated* > by KVM must not be emulated until KVM knows the vCPU is at an instruction boundary. > Specifically, enter_smm() shouldn't be invoked while KVM is in the middle of > instruction emulation (even if "emulation" is just setting registers and skipping > the instruction). Theoretically, that could be fixed by honoring the existing > at_instruction_boundary flag for SMIs, but that'd be a rather large change and > at_instruction_boundary is nowhere near accurate enough to use right now. > > Anyways, before we do anything, I'd like to get Maxim's input on what exactly was > addressed by 759cbd59674a. Thank you very much for such a comprehensive review! I've learned a lot. Will try to help with the mentioned problems. Michał