Anirudh Rayabharam <anrayabh@xxxxxxxxxxxxxxxxxxx> writes: > On Wed, Jun 22, 2022 at 10:00:29AM +0200, Vitaly Kuznetsov wrote: >> Sean Christopherson <seanjc@xxxxxxxxxx> writes: >> >> > On Tue, Jun 14, 2022, Anirudh Rayabharam wrote: >> >> On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote: >> >> ... >> >> >> > >> >> > Any reason not to use the already sanitized vmcs_config? I can't think of any >> >> > reason why the nested path should blindly use the raw MSR values from hardware. >> >> >> >> vmcs_config has the sanitized exec controls. But how do we construct MSR >> >> values using them? >> > >> > I was thinking we could use the sanitized controls for the allowed-1 bits, and then >> > take the required-1 bits from the CPU. And then if we wanted to avoid the redundant >> > RDMSRs in a follow-up patch we could add required-1 fields to vmcs_config. >> > >> > Hastily constructed and compile-tested only, proceed with caution :-) >> >> Independently from "[PATCH 00/11] KVM: VMX: Support TscScaling and >> EnclsExitingBitmap whith eVMCS" which is supposed to fix the particular >> TSC scaling issue, I like the idea to make nested_vmx_setup_ctls_msrs() >> use both allowed-1 and required-1 bits from vmcs_config. I'll pick up >> the suggested patch and try to construct something for required-1 bits. > > I tried this patch today but it causes some regression which causes > /dev/kvm to be unavailable in L1. I didn't get a chance to look into it > closely but I am guessing it has something to do with the fact that > vmcs_config reflects the config that L0 chose to use rather than what is > available to use. So constructing allowed-1 MSR bits based on what bits > are set in exec controls maybe isn't correct. I've tried to pick it up but it's actually much harder than I think. The patch has some minor issues ('&vmcs_config.nested' needs to be switched to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main problem is that the set of controls nested_vmx_setup_ctls_msrs() needs is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to identify at least: diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 5e14e4c40007..8076352174ad 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2483,8 +2483,14 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, CPU_BASED_INVLPG_EXITING | CPU_BASED_RDPMC_EXITING; - opt = CPU_BASED_TPR_SHADOW | + opt = CPU_BASED_INTR_WINDOW_EXITING | + CPU_BASED_NMI_WINDOW_EXITING | + CPU_BASED_TPR_SHADOW | + CPU_BASED_USE_IO_BITMAPS | CPU_BASED_USE_MSR_BITMAPS | + CPU_BASED_MONITOR_TRAP_FLAG | + CPU_BASED_RDTSC_EXITING | + CPU_BASED_PAUSE_EXITING | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS | CPU_BASED_ACTIVATE_TERTIARY_CONTROLS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS, @@ -2582,6 +2588,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, #endif opt = VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_PAT | + VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_EFER | VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_PT_CONCEAL_PIP | @@ -2604,7 +2611,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, _pin_based_exec_control &= ~PIN_BASED_POSTED_INTR; min = VM_ENTRY_LOAD_DEBUG_CONTROLS; - opt = VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | + opt = +#ifdef CONFIG_X86_64 + VM_ENTRY_IA32E_MODE | +#endif + VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_IA32_EFER | VM_ENTRY_LOAD_BNDCFGS | but it is 1) not sufficient because some controls are smartly filtered out just because we don't want them for L1 -- and this doesn't mean that L2 doesn't need them and 2) because if we add some 'opt' controls to setup_vmcs_config() we need to filter them out somewhere else. I'm starting to think we may just want to store raw VMX MSR values in vmcs_config first, then sanitize them (eVMCS, vmx preemtoion timer bug, perf_ctrl bug,..) and then do the adjust_vmx_controls() magic. I'm not giving up yet but don't expect something small and backportable to stable :-) -- Vitaly