On 1/18/2023 6:25 PM, Vitaly Kuznetsov wrote: > Sean Christopherson <seanjc@xxxxxxxxxx> writes: > >> On Wed, Jan 18, 2023, Vitaly Kuznetsov wrote: >>> Sean Christopherson <seanjc@xxxxxxxxxx> writes: >>> >>>> On Wed, Jan 18, 2023, Alexandru Matei wrote: >>>>> KVM enables 'Enlightened VMCS' and 'Enlightened MSR Bitmap' when running as >>>>> a nested hypervisor on top of Hyper-V. When MSR bitmap is updated, >>>>> evmcs_touch_msr_bitmap function uses current_vmcs per-cpu variable to mark >>>>> that the msr bitmap was changed. >>>>> >>>>> vmx_vcpu_create() modifies the msr bitmap via vmx_disable_intercept_for_msr >>>>> -> vmx_msr_bitmap_l01_changed which in the end calls this function. The >>>>> function checks for current_vmcs if it is null but the check is >>>>> insufficient because current_vmcs is not initialized. Because of this, the >>>>> code might incorrectly write to the structure pointed by current_vmcs value >>>>> left by another task. Preemption is not disabled so the current task can >>>>> also be preempted and moved to another CPU while current_vmcs is accessed >>>>> multiple times from evmcs_touch_msr_bitmap() which leads to crash. >>>>> >>>>> To fix this problem, this patch moves vmx_disable_intercept_for_msr calls >>>>> before init_vmcs call in __vmx_vcpu_reset(), as ->vcpu_reset() is invoked >>>>> after the vCPU is properly loaded via ->vcpu_load() and current_vmcs is >>>>> initialized. >>>> >>>> IMO, moving the calls is a band-aid and doesn't address the underlying bug. I >>>> don't see any reason why the Hyper-V code should use a per-cpu pointer in this >>>> case. It makes sense when replacing VMX sequences that operate on the VMCS, e.g. >>>> VMREAD, VMWRITE, etc., but for operations that aren't direct replacements for VMX >>>> instructions I think we should have a rule that Hyper-V isn't allowed to touch the >>>> per-cpu pointer. >>>> >>>> E.g. in this case it's trivial to pass down the target (completely untested). >>>> >>>> Vitaly? >>> >>> Mid-air collision detected) I've just suggested a very similar approach >>> but instead of 'vmx->vmcs01.vmcs' I've suggested using >>> 'vmx->loaded_vmcs->vmcs': in case we're running L2 and loaded VMCS is >>> 'vmcs02', I think we still need to touch the clean field indicating that >>> MSR-Bitmap has changed. Equally untested :-) >> >> Three reasons to use vmcs01 directly: >> >> 1. I don't want to require loaded_vmcs to be set. E.g. in the problematic >> flows, this >> >> vmx->loaded_vmcs = &vmx->vmcs01; >> >> comes after the calls to vmx_disable_intercept_for_msr(). >> >> 2. KVM on Hyper-V doesn't use the bitmaps for L2 (evmcs02): >> >> /* >> * Use Hyper-V 'Enlightened MSR Bitmap' feature when KVM runs as a >> * nested (L1) hypervisor and Hyper-V in L0 supports it. Enable the >> * feature only for vmcs01, KVM currently isn't equipped to realize any >> * performance benefits from enabling it for vmcs02. >> */ >> if (IS_ENABLED(CONFIG_HYPERV) && static_branch_unlikely(&enable_evmcs) && >> (ms_hyperv.nested_features & HV_X64_NESTED_MSR_BITMAP)) { >> struct hv_enlightened_vmcs *evmcs = (void *)vmx->vmcs01.vmcs; >> >> evmcs->hv_enlightenments_control.msr_bitmap = 1; >> } > > Oh, indeed, I've forgotten this. I'm fine with 'vmx->vmcs01' then but > let's leave a comment (which I've going to also forget about, but still) > that eMSR bitmap is an L1-only feature. > >> >> 3. KVM's manipulation of MSR bitmaps typically happens _only_ for vmcs01, >> e.g. the caller is vmx_msr_bitmap_l01_changed(). The nested case is a >> special snowflake. >> > Thanks Sean and Vitaly for your insights and suggestions. I'll redo the patch using your code Sean if it's ok with you and run the tests again.