Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> writes: > Anirudh Rayabharam <anrayabh@xxxxxxxxxxxxxxxxxxx> writes: > >> When running cloud-hypervisor tests, VM entry into an L2 guest on KVM on >> Hyper-V fails with this splat (stripped for brevity): >> >> [ 1481.600386] WARNING: CPU: 4 PID: 7641 at arch/x86/kvm/vmx/nested.c:4563 nested_vmx_vmexit+0x70d/0x790 [kvm_intel] >> [ 1481.600427] CPU: 4 PID: 7641 Comm: vcpu2 Not tainted 5.15.0-1008-azure #9-Ubuntu >> [ 1481.600429] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 07/22/2021 >> [ 1481.600430] RIP: 0010:nested_vmx_vmexit+0x70d/0x790 [kvm_intel] >> [ 1481.600447] Call Trace: >> [ 1481.600449] <TASK> >> [ 1481.600451] nested_vmx_reflect_vmexit+0x10b/0x440 [kvm_intel] >> [ 1481.600457] __vmx_handle_exit+0xef/0x670 [kvm_intel] >> [ 1481.600467] vmx_handle_exit+0x12/0x50 [kvm_intel] >> [ 1481.600472] vcpu_enter_guest+0x83a/0xfd0 [kvm] >> [ 1481.600524] vcpu_run+0x5e/0x240 [kvm] >> [ 1481.600560] kvm_arch_vcpu_ioctl_run+0xd7/0x550 [kvm] >> [ 1481.600597] kvm_vcpu_ioctl+0x29a/0x6d0 [kvm] >> [ 1481.600634] __x64_sys_ioctl+0x91/0xc0 >> [ 1481.600637] do_syscall_64+0x5c/0xc0 >> [ 1481.600667] entry_SYSCALL_64_after_hwframe+0x44/0xae >> [ 1481.600670] RIP: 0033:0x7f688becdaff >> [ 1481.600686] </TASK> >> >> TSC multiplier field is currently not supported in EVMCS in KVM. It was >> previously not supported from Hyper-V but has been added since. Because >> it is not supported in KVM the use "TSC scaling control" is filtered out >> of vmcs_config by evmcs_sanitize_exec_ctrls(). >> >> However, in nested_vmx_setup_ctls_msrs(), TSC scaling is exposed to L1. >> eVMCS unsupported fields are not sanitized. When L1 tries to launch an L2 >> guest, vmcs12 has TSC scaling enabled. This propagates to vmcs02. But KVM >> doesn't set the TSC multiplier value because kvm_has_tsc_control is false. >> Due to this VM entry for L2 guest fails. (VM entry fails if >> "use TSC scaling" is 1 but TSC multiplier is 0.) >> >> To fix, in nested_vmx_setup_ctls_msrs(), sanitize the values read from MSRs >> by filtering out fields that are not supported by eVMCS. >> >> This is a stable-friendly intermediate fix. A more comprehensive fix is >> in progress [1] but is probably too complicated to safely apply to >> stable. >> >> [1]: https://lore.kernel.org/kvm/20220627160440.31857-1-vkuznets@xxxxxxxxxx/ >> >> Fixes: d041b5ea93352 ("KVM: nVMX: Enable nested TSC scaling") >> Signed-off-by: Anirudh Rayabharam <anrayabh@xxxxxxxxxxxxxxxxxxx> >> --- >> >> Changes since v1: >> - Sanitize all eVMCS unsupported fields instead of just TSC scaling. >> >> v1: https://lore.kernel.org/lkml/20220613161611.3567556-1-anrayabh@xxxxxxxxxxxxxxxxxxx/ >> >> --- >> arch/x86/kvm/vmx/nested.c | 16 ++++++++++++++++ >> 1 file changed, 16 insertions(+) >> >> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c >> index f5cb18e00e78..f88d748c7cc6 100644 >> --- a/arch/x86/kvm/vmx/nested.c >> +++ b/arch/x86/kvm/vmx/nested.c >> @@ -6564,6 +6564,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps) >> msrs->pinbased_ctls_high); >> msrs->pinbased_ctls_low |= >> PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR; >> +#if IS_ENABLED(CONFIG_HYPERV) >> + if (static_branch_unlikely(&enable_evmcs)) >> + msrs->pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL; >> +#endif >> msrs->pinbased_ctls_high &= >> PIN_BASED_EXT_INTR_MASK | >> PIN_BASED_NMI_EXITING | >> @@ -6580,6 +6584,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps) >> msrs->exit_ctls_low = >> VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR; >> >> +#if IS_ENABLED(CONFIG_HYPERV) >> + if (static_branch_unlikely(&enable_evmcs)) >> + msrs->exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL; >> +#endif >> msrs->exit_ctls_high &= >> #ifdef CONFIG_X86_64 >> VM_EXIT_HOST_ADDR_SPACE_SIZE | >> @@ -6600,6 +6608,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps) >> msrs->entry_ctls_high); >> msrs->entry_ctls_low = >> VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; >> +#if IS_ENABLED(CONFIG_HYPERV) >> + if (static_branch_unlikely(&enable_evmcs)) >> + msrs->entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL; >> +#endif >> msrs->entry_ctls_high &= >> #ifdef CONFIG_X86_64 >> VM_ENTRY_IA32E_MODE | >> @@ -6657,6 +6669,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps) >> msrs->secondary_ctls_high); >> >> msrs->secondary_ctls_low = 0; >> +#if IS_ENABLED(CONFIG_HYPERV) >> + if (static_branch_unlikely(&enable_evmcs)) >> + msrs->secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC; >> +#endif >> msrs->secondary_ctls_high &= >> SECONDARY_EXEC_DESC | >> SECONDARY_EXEC_ENABLE_RDTSCP | > > (In theory, threre's also EVMCS1_UNSUPPORTED_VMFUNC filtering out > VMX_VMFUNC_EPTP_SWITCHING (as eVMCS EPTP_LIST_ADDRESS) but it is not > used by KVM) > > As I said in another thread, I think this is fine as a > stable@/intermediate fix. Assuming the way to go for mainline is my > "KVM: nVMX: Use vmcs_config for setting up nested VMX MSRs", this patch > won't be needed and can be reverted. (I've already said that in another thread, putting it here for visibility) I have to take this back. As-is, the patch is likely to break live migration because of the reasons expressed by Jim: https://lore.kernel.org/kvm/CALMp9eSBLcvuNDquvSfUnaF3S3f4ZkzqDRSsz-v93ZeX=xnssg@xxxxxxxxxxxxxx/ In case we filter out SECONDARY_EXEC_TSC_SCALING here in nested_vmx_setup_ctls_msrs(), KVM_SET_MSRS on the destination will fail because the data will have an unsupported bit. It seems the easiest way to go for stable@ is to actually enable TSC scaling, like the 4 patches I've suggested here: https://lore.kernel.org/kvm/87bkujy4z9.fsf@xxxxxxxxxx/ The full solution including vmcs_config usage for nested_vmx_setup_ctls_msrs() is 28 patches now, it's likely a no-go for stable :-( -- Vitaly