On 1/23/2023 3:24 PM, Vitaly Kuznetsov wrote: > Alexandru Matei <alexandru.matei@xxxxxxxxxx> writes: > >> KVM enables 'Enlightened VMCS' and 'Enlightened MSR Bitmap' when running as >> a nested hypervisor on top of Hyper-V. When MSR bitmap is updated, >> evmcs_touch_msr_bitmap function uses current_vmcs per-cpu variable to mark >> that the msr bitmap was changed. >> >> vmx_vcpu_create() modifies the msr bitmap via vmx_disable_intercept_for_msr >> -> vmx_msr_bitmap_l01_changed which in the end calls this function. The >> function checks for current_vmcs if it is null but the check is >> insufficient because current_vmcs is not initialized. Because of this, the >> code might incorrectly write to the structure pointed by current_vmcs value >> left by another task. Preemption is not disabled, the current task can be >> preempted and moved to another CPU while current_vmcs is accessed multiple >> times from evmcs_touch_msr_bitmap() which leads to crash. >> >> The manipulation of MSR bitmaps by callers happens only for vmcs01 so the >> solution is to use vmx->vmcs01.vmcs instead of current_vmcs. >> >> BUG: kernel NULL pointer dereference, address: 0000000000000338 >> PGD 4e1775067 P4D 0 >> Oops: 0002 [#1] PREEMPT SMP NOPTI >> ... >> RIP: 0010:vmx_msr_bitmap_l01_changed+0x39/0x50 [kvm_intel] >> ... >> Call Trace: >> vmx_disable_intercept_for_msr+0x36/0x260 [kvm_intel] >> vmx_vcpu_create+0xe6/0x540 [kvm_intel] >> ? __vmalloc_node+0x4a/0x70 >> kvm_arch_vcpu_create+0x1d1/0x2e0 [kvm] >> kvm_vm_ioctl_create_vcpu+0x178/0x430 [kvm] >> ? __handle_mm_fault+0x3cb/0x750 >> kvm_vm_ioctl+0x53f/0x790 [kvm] >> ? syscall_exit_work+0x11a/0x150 >> ? syscall_exit_to_user_mode+0x12/0x30 >> ? do_syscall_64+0x69/0x90 >> ? handle_mm_fault+0xc5/0x2a0 >> __x64_sys_ioctl+0x8a/0xc0 >> do_syscall_64+0x5c/0x90 >> ? exc_page_fault+0x62/0x150 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> >> Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx> >> Signed-off-by: Alexandru Matei <alexandru.matei@xxxxxxxxxx> >> --- >> v2: >> - pass (e)vmcs01 to evmcs_touch_msr_bitmap >> - use loaded_vmcs * instead of vcpu_vmx * to avoid >> including vmx.h which generates circular dependency >> >> v1: https://lore.kernel.org/kvm/20230118141348.828-1-alexandru.matei@xxxxxxxxxx/ >> >> arch/x86/kvm/vmx/hyperv.h | 16 +++++++++++----- >> arch/x86/kvm/vmx/vmx.c | 2 +- >> 2 files changed, 12 insertions(+), 6 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx/hyperv.h b/arch/x86/kvm/vmx/hyperv.h >> index 571e7929d14e..132e32e57d2d 100644 >> --- a/arch/x86/kvm/vmx/hyperv.h >> +++ b/arch/x86/kvm/vmx/hyperv.h >> @@ -190,13 +190,19 @@ static inline u16 evmcs_read16(unsigned long field) >> return *(u16 *)((char *)current_evmcs + offset); >> } >> >> -static inline void evmcs_touch_msr_bitmap(void) >> +static inline void evmcs_touch_msr_bitmap(struct loaded_vmcs *vmcs01) > > Personally, I would've followed Sean's suggestion and passed "struct > vcpu_vmx *vmx" as 'loaded_vmcs' here is a bit ambiguos.... I couldn't use vcpu_vmx *, it requires including vmx.h in hyperv.h and it generates a circular depedency which leads to compile errors. > >> { >> - if (unlikely(!current_evmcs)) >> + /* >> + * Enlightened MSR Bitmap feature is enabled only for L1, i.e. >> + * always operates on (e)vmcs01 >> + */ >> + struct hv_enlightened_vmcs* evmcs = (void*)vmcs01->vmcs; > > (Nit: a space between "void" and "*") > > .. or, alternatively, you can directly pass > (struct hv_enlightened_vmcs *)vmx->vmcs01.vmcs as e.g. 'current_evmcs' > and avoid the conversion here. OK, sounds good, I'll pass hv_enlightened_vmcs * directly. > >> + >> + if (WARN_ON_ONCE(!evmcs)) >> return; >> >> - if (current_evmcs->hv_enlightenments_control.msr_bitmap) >> - current_evmcs->hv_clean_fields &= >> + if (evmcs->hv_enlightenments_control.msr_bitmap) >> + evmcs->hv_clean_fields &= >> ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP; >> } >> >> @@ -219,7 +225,7 @@ static inline u64 evmcs_read64(unsigned long field) { return 0; } >> static inline u32 evmcs_read32(unsigned long field) { return 0; } >> static inline u16 evmcs_read16(unsigned long field) { return 0; } >> static inline void evmcs_load(u64 phys_addr) {} >> -static inline void evmcs_touch_msr_bitmap(void) {} >> +static inline void evmcs_touch_msr_bitmap(struct loaded_vmcs *vmcs01) {} >> #endif /* IS_ENABLED(CONFIG_HYPERV) */ >> >> #define EVMPTR_INVALID (-1ULL) >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >> index fe5615fd8295..2a3be8e8a1bf 100644 >> --- a/arch/x86/kvm/vmx/vmx.c >> +++ b/arch/x86/kvm/vmx/vmx.c >> @@ -3869,7 +3869,7 @@ static void vmx_msr_bitmap_l01_changed(struct vcpu_vmx *vmx) >> * bitmap has changed. >> */ >> if (static_branch_unlikely(&enable_evmcs)) >> - evmcs_touch_msr_bitmap(); >> + evmcs_touch_msr_bitmap(&vmx->vmcs01); >> >> vmx->nested.force_msr_bitmap_recalc = true; >> } >