On Mon, Jul 15, 2024, Kishen Maloor wrote: > This change aims to resolve a TODO documented in commit 5d76b1f8c793 > ("KVM: nVMX: Rename nested.vmcs01_* fields to nested.pre_vmenter_*"): > > /* > * TODO: Implement custom flows for forcing the vCPU out/in of L2 on > * SMI and RSM. Using the common VM-Exit + VM-Enter routines is wrong > * SMI and RSM only modify state that is saved and restored via SMRAM. Sorry, but this does not implement what the TODO suggest. Specifically, it touches _far_ more state than what is saved/restored in SMRAM. Implementing custom SMI+RSM handling will require an annoying amount of coding, which is why it hasn't been done yet. > * E.g. most MSRs are left untouched, but many are modified by VM-Exit > * and VM-Enter, and thus L2's values may be corrupted on SMI+RSM. > */ > ... > +int nested_vmx_leave_smm(struct kvm_vcpu *vcpu) > +{ > + enum vm_entry_failure_code entry_failure_code; > + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > + struct vcpu_vmx *vmx = to_vmx(vcpu); > + > + kvm_service_local_tlb_flush_requests(vcpu); > + > + vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02); > + > + prepare_vmcs02_early(vmx, &vmx->vmcs01, vmcs12); > + > + if (prepare_vmcs02(vcpu, vmcs12, false, &entry_failure_code)) { > + vmx_switch_vmcs(vcpu, &vmx->vmcs01); > + nested_vmx_restore_host_state(vcpu); This is blatantly wrong, a failure during RSM results in shutdown. And I'm 99% certain that restoring "critical VMX state" can't fail, i.e. this path shouldn't exist at all. Per the SDM, critical state is saved in a uarch-specific location, i.e. it can't be accessed by software and thus isn't validated on RSM. And interestingly, CR0/4 fixed bits are _forced_, not validated: set to their fixed values any bits in CR0 and CR4 whose values must be fixed in VMX operation (see Section 24.8); Ditto for CS/SS RPL vs. DPL.