If guest LBR is disabled at vCPU sched-in time, the vLBR event will be released, then the following guest LBR MSRs accessing will be trapped, and cause KVM to create new vLBR event. If this new vLBR event is the only user of host LBR facility, host LBR driver will reset LBR facility at vLBR creation. So guest LBR content may be changed during vCPU sched-out and sched-in. Considering this serial: 1. Guest disables LBR. 2. Guest reads LBR MSRs, but it doesn't finish. 3. vCPU is sched-out, later sched-in, vLBR event is released. 4. Guest continue reading LBR MSRs, KVM creates vLBR event again, if this vLBR event is the only LBR user on host now, host LBR driver will reset HW LBR facility at vLBR creataion. 5. Guest gets the remain LBR MSRs with reset state. So gueest LBR MSRs reading before vCPU sched-out is correct, while guest LBR MSRs reading after vCPU sched-out is wrong and is in reset state. Similarly guest LBR MSRs writing before vCPU sched-out is lost and is in reset state, while guest LBR MSRs writing after vCPU sched-out is correct. This is a bug that guest LBR content is changed as vCPU's scheduling. This can happen when guest LBR MSRs accessing spans vCPU's scheduling, usually guest access LBR MSRs at guest task switch and PMI handler. Two options could be used to fixed this bug: a. Save guest LBR snapshot at vLBR release in step 3, then restore guest LBR after vLBR creation in step 4. But the number of LBR MSRs is near 100, this means 100 MSRs reading and 100s writing are needed for each vLBR release, the overhead is too heavy. b. Defer vLBR release in step 3. This commit choose the option b. Guest LBR MSRs accessing is passthrough, so the interceptable guest DEBUGCTLMSR_LBR bit is used to predict guest LBR usage. If guest LBR is disabled in a whole vCPU shced time slice, KVM will predict guest LBR won't be used recently, then vLBR will be released in next vCPU sched-in. Guest LBR MSRs accessing should be finished in two vCPU sched time slice, otherwise it is maybe a guest LBR driver bug and can not be supported by this commit. Signed-off-by: Xiong Zhang <xiong.y.zhang@xxxxxxxxx> --- arch/x86/kvm/vmx/pmu_intel.c | 10 ++++++++-- arch/x86/kvm/vmx/vmx.c | 12 +++++++++--- arch/x86/kvm/vmx/vmx.h | 2 ++ 3 files changed, 19 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index f2efa0bf7ae8..76d7bd8e4fc6 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -628,6 +628,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu) lbr_desc->records.nr = 0; lbr_desc->event = NULL; lbr_desc->msr_passthrough = false; + lbr_desc->in_use = FALSE; } static void intel_pmu_reset(struct kvm_vcpu *vcpu) @@ -761,8 +762,13 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu) static void intel_pmu_cleanup(struct kvm_vcpu *vcpu) { - if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)) - intel_pmu_release_guest_lbr_event(vcpu); + struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu); + + if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)) { + if (!lbr_desc->in_use) + intel_pmu_release_guest_lbr_event(vcpu); + lbr_desc->in_use = false; + } } void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 72e3943f3693..4056e19266b5 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2238,9 +2238,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) get_vmcs12(vcpu)->guest_ia32_debugctl = data; vmcs_write64(GUEST_IA32_DEBUGCTL, data); - if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event && - (data & DEBUGCTLMSR_LBR)) - intel_pmu_create_guest_lbr_event(vcpu); + + if (intel_pmu_lbr_is_enabled(vcpu) && (data & DEBUGCTLMSR_LBR)) { + struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu); + + lbr_desc->in_use = true; + if (!lbr_desc->event) + intel_pmu_create_guest_lbr_event(vcpu); + } + return 0; } case MSR_IA32_BNDCFGS: diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index c2130d2c8e24..547edeb52d09 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -107,6 +107,8 @@ struct lbr_desc { /* True if LBRs are marked as not intercepted in the MSR bitmap */ bool msr_passthrough; + + bool in_use; }; /* -- 2.34.1