On Wed, Jan 18, 2023, Alexandru Matei wrote: > KVM enables 'Enlightened VMCS' and 'Enlightened MSR Bitmap' when running as > a nested hypervisor on top of Hyper-V. When MSR bitmap is updated, > evmcs_touch_msr_bitmap function uses current_vmcs per-cpu variable to mark > that the msr bitmap was changed. > > vmx_vcpu_create() modifies the msr bitmap via vmx_disable_intercept_for_msr > -> vmx_msr_bitmap_l01_changed which in the end calls this function. The > function checks for current_vmcs if it is null but the check is > insufficient because current_vmcs is not initialized. Because of this, the > code might incorrectly write to the structure pointed by current_vmcs value > left by another task. Preemption is not disabled so the current task can > also be preempted and moved to another CPU while current_vmcs is accessed > multiple times from evmcs_touch_msr_bitmap() which leads to crash. > > To fix this problem, this patch moves vmx_disable_intercept_for_msr calls > before init_vmcs call in __vmx_vcpu_reset(), as ->vcpu_reset() is invoked > after the vCPU is properly loaded via ->vcpu_load() and current_vmcs is > initialized. IMO, moving the calls is a band-aid and doesn't address the underlying bug. I don't see any reason why the Hyper-V code should use a per-cpu pointer in this case. It makes sense when replacing VMX sequences that operate on the VMCS, e.g. VMREAD, VMWRITE, etc., but for operations that aren't direct replacements for VMX instructions I think we should have a rule that Hyper-V isn't allowed to touch the per-cpu pointer. E.g. in this case it's trivial to pass down the target (completely untested). Vitaly? --- arch/x86/kvm/vmx/hyperv.h | 12 +++++++----- arch/x86/kvm/vmx/vmx.c | 2 +- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/vmx/hyperv.h b/arch/x86/kvm/vmx/hyperv.h index ab08a9b9ab7d..ad16b52766bb 100644 --- a/arch/x86/kvm/vmx/hyperv.h +++ b/arch/x86/kvm/vmx/hyperv.h @@ -250,13 +250,15 @@ static inline u16 evmcs_read16(unsigned long field) return *(u16 *)((char *)current_evmcs + offset); } -static inline void evmcs_touch_msr_bitmap(void) +static inline void evmcs_touch_msr_bitmap(struct vcpu_vmx *vmx) { - if (unlikely(!current_evmcs)) + struct hv_enlightened_vmcs *evmcs = (void *)vmx->vmcs01.vmcs; + + if (WARN_ON_ONCE(!evmcs)) return; - if (current_evmcs->hv_enlightenments_control.msr_bitmap) - current_evmcs->hv_clean_fields &= + if (evmcs->hv_enlightenments_control.msr_bitmap) + evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP; } @@ -280,7 +282,7 @@ static inline u64 evmcs_read64(unsigned long field) { return 0; } static inline u32 evmcs_read32(unsigned long field) { return 0; } static inline u16 evmcs_read16(unsigned long field) { return 0; } static inline void evmcs_load(u64 phys_addr) {} -static inline void evmcs_touch_msr_bitmap(void) {} +static inline void evmcs_touch_msr_bitmap(struct vcpu_vmx *vmx) {} #endif /* IS_ENABLED(CONFIG_HYPERV) */ #define EVMPTR_INVALID (-1ULL) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c788aa382611..6ed6f52aad0c 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3937,7 +3937,7 @@ static void vmx_msr_bitmap_l01_changed(struct vcpu_vmx *vmx) * bitmap has changed. */ if (static_branch_unlikely(&enable_evmcs)) - evmcs_touch_msr_bitmap(); + evmcs_touch_msr_bitmap(vmx); vmx->nested.force_msr_bitmap_recalc = true; } base-commit: 6e9a476ea49d43a27b42004cfd7283f128494d1d --