On 25/6/2023 12:03 pm, Zhang, Xiong Y wrote:
On Fri, Jun 16, 2023, Xiong Zhang wrote:
Perf defines four types of perf event: per cpu pinned event, per
process pinned event, per cpu event, per process event, their
prioirity are from high to low. vLBR event is per process pinned
event. So durng vm exit handler, if vLBR event preempts perf low
priority LBR event, perf will disable LBR and let guest control LBR,
or if vLBR event is preempted by perf high priority LBR event, perf
will enable LBR. In a word LBR status may be changed during vm exit handler.
MSR_IA32_DEBUGCTLMSR[0] controls LBR enabling, kvm saves its value
into
vmx->host_debugctlmsr in vcpu_load(), and kvm restores its value from
vmx->host_debugctlmsr after vm exit immediately. Since
MSR_IA32_DEBUGCTLMSR[0] could be changed during vm exit handler, the
saved value vmx->host_debugctlmsr could be wrong. So this commit saves
MSR_IA32_DEBUGCTLMSR into vmx->host_debugctlmsr just before vm entry
to reflect the real hardware value.
Signed-off-by: Xiong Zhang <xiong.y.zhang@xxxxxxxxx>
---
arch/x86/kvm/vmx/vmx.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
44fb619803b8..5ca61a26d0d7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1459,13 +1459,9 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu,
int cpu,
*/
static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {
- struct vcpu_vmx *vmx = to_vmx(vcpu);
-
vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
vmx_vcpu_pi_load(vcpu, cpu);
-
- vmx->host_debugctlmsr = get_debugctlmsr();
}
static void vmx_vcpu_put(struct kvm_vcpu *vcpu) @@ -7273,6 +7269,7 @@
static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
atomic_switch_perf_msrs(vmx);
if (intel_pmu_lbr_is_enabled(vcpu))
vmx_passthrough_lbr_msrs(vcpu);
+ vmx->host_debugctlmsr = get_debugctlmsr();
Reading DEBUG_CTL on every VM-Entry is either unnecessary or insufficient. If
the DEBUG_CTL value is being changed synchronously, then just fix whatever
KVM path leads to a change in the host avlue. If DEBUG_CTL is being changed
asynchronously, then I'm guessing the change is coming from NMI context,
which means that KVM is buggy no matter how close we put this to VM-Enter.
When a perf event reschedule is needed on a physical cpu, perf scheduler send an IPI to the target cpu, LBR will be enabled or disabled in the IPI handler according to active event attribute.
If vLBR event is active, LBR is disabled in IPI handler.
If Host LBR event is active, LBR is enabled in the IPI handler, this could happen when host LBR event preempt vLBR event during vm exit handler.
DEBUG_CTL[0]'s changing is asynchronous in the perf IPI handler, host irq is disabled near VM-Enter, so IPI couldn't happen, then host DEBUG_CTL[0] couldn't change before kvm enable host irq.
Perf event counter overflow (PMI) is a NMI, but this NMI handler doesn't change LBR status, the kvm saved host_debugctlmsr is correct still after PMI handler.
thanks
This is not true. One example is Freezing LBRs on PMI (bit 11) in the host NMI ctx.
For "Legacy Freeze_LBR_on_PMI" feature on a host, "the LBR is frozen on the
overflowed condition of the buffer area, the processor clears the LBR bit
(bit 0) in IA32_DEBUGCTL."
Not to mention that the commit message makes no mention of the effect of
this change on other features on DEBUG_CTL.
I couldn't agree with Sean more here. I think the first is to make sure that
debugctl's
functionality is not broken in both root mode and non-root mode, followed closely
by what policy should be set and notified to any user if host/kvm are not in a
position to support either side.