Alexandru Matei <alexandru.matei@xxxxxxxxxx> writes: > KVM enables 'Enlightened VMCS' and 'Enlightened MSR Bitmap' when running as > a nested hypervisor on top of Hyper-V. When MSR bitmap is updated, > evmcs_touch_msr_bitmap function uses current_vmcs per-cpu variable to mark > that the msr bitmap was changed. > > vmx_vcpu_create() modifies the msr bitmap via vmx_disable_intercept_for_msr > -> vmx_msr_bitmap_l01_changed which in the end calls this function. The > function checks for current_vmcs if it is null but the check is > insufficient because current_vmcs is not initialized. I'm failing to remember why evmcs_touch_msr_bitmap() needs 'current_vmcs' at all, can we just use 'vmx->loaded_vmcs->vmcs' (after passing 'vmx' parameter to it) instead? > Because of this, the > code might incorrectly write to the structure pointed by current_vmcs value > left by another task. Preemption is not disabled so the current task can > also be preempted and moved to another CPU while current_vmcs is accessed > multiple times from evmcs_touch_msr_bitmap() which leads to crash. > > To fix this problem, this patch moves vmx_disable_intercept_for_msr calls > before init_vmcs call in __vmx_vcpu_reset(), as ->vcpu_reset() is invoked > after the vCPU is properly loaded via ->vcpu_load() and current_vmcs is > initialized. > > BUG: kernel NULL pointer dereference, address: 0000000000000338 > PGD 4e1775067 P4D 0 > Oops: 0002 [#1] PREEMPT SMP NOPTI > ... > RIP: 0010:vmx_msr_bitmap_l01_changed+0x39/0x50 [kvm_intel] > ... > Call Trace: > vmx_disable_intercept_for_msr+0x36/0x260 [kvm_intel] > vmx_vcpu_create+0xe6/0x540 [kvm_intel] > ? __vmalloc_node+0x4a/0x70 > kvm_arch_vcpu_create+0x1d1/0x2e0 [kvm] > kvm_vm_ioctl_create_vcpu+0x178/0x430 [kvm] > ? __handle_mm_fault+0x3cb/0x750 > kvm_vm_ioctl+0x53f/0x790 [kvm] > ? syscall_exit_work+0x11a/0x150 > ? syscall_exit_to_user_mode+0x12/0x30 > ? do_syscall_64+0x69/0x90 > ? handle_mm_fault+0xc5/0x2a0 > __x64_sys_ioctl+0x8a/0xc0 > do_syscall_64+0x5c/0x90 > ? exc_page_fault+0x62/0x150 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > Signed-off-by: Alexandru Matei <alexandru.matei@xxxxxxxxxx> > --- > arch/x86/kvm/vmx/vmx.c | 32 ++++++++++++++++---------------- > 1 file changed, 16 insertions(+), 16 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index fe5615fd8295..168138dfb0b4 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -4735,6 +4735,22 @@ static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > > + vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R); > +#ifdef CONFIG_X86_64 > + vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW); > + vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW); > + vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW); > +#endif > + vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW); > + vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW); > + vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW); > + if (kvm_cstate_in_guest(vcpu->kvm)) { > + vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R); > + vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R); > + vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R); > + vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R); > + } > + > init_vmcs(vmx); > > if (nested) > @@ -7363,22 +7379,6 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu) > bitmap_fill(vmx->shadow_msr_intercept.read, MAX_POSSIBLE_PASSTHROUGH_MSRS); > bitmap_fill(vmx->shadow_msr_intercept.write, MAX_POSSIBLE_PASSTHROUGH_MSRS); I think that vmx_disable_intercept_for_msr() you move uses 'vmx->shadow_msr_intercept.read'/'vmx->shadow_msr_intercept.write' from above so there's a dependency here. Let's avoid the churn (if possible). > > - vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R); > -#ifdef CONFIG_X86_64 > - vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW); > - vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW); > - vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW); > -#endif > - vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW); > - vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW); > - vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW); > - if (kvm_cstate_in_guest(vcpu->kvm)) { > - vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R); > - vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R); > - vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R); > - vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R); > - } > - > vmx->loaded_vmcs = &vmx->vmcs01; > > if (cpu_need_virtualize_apic_accesses(vcpu)) { -- Vitaly