Re: [PATCH 5/5] KVM: x86/pmu: Hide guest counter updates from the VMRUN instruction

Like Xu <like.xu.linux@xxxxxxxxx> · Fri, 7 Apr 2023 16:15:24 +0800

On 7/4/2023 10:18 am, Sean Christopherson wrote:
On Fri, Mar 10, 2023, Like Xu wrote:
From: Like Xu <likexu@xxxxxxxxxxx>

When AMD guest is counting (branch) instructions event, its vPMU should
first subtract one for any relevant (branch)-instructions enabled counter
(when it precedes VMRUN and cannot be preempted) to offset the inevitable
plus-one effect of the VMRUN instruction immediately follows.

Based on a number of micro observations (also the reason why x86_64/
pmu_event_filter_test fails on AMD Zen platforms), each VMRUN will
increment all hw-(branch)-instructions counters by 1, even if they are
only enabled for guest code. This issue seriously affects the performance
understanding of guest developers based on (branch) instruction events.

If the current physical register value on the hardware is ~0x0, it triggers
an overflow in the guest world right after running VMRUN. Although this
cannot be avoided on mainstream released hardware, the resulting PMI
(if configured) will not be incorrectly injected into the guest by vPMU,
since the delayed injection mechanism for a normal counter overflow
depends only on the change of pmc->counter values.

IIUC, this is saying that KVM may get a spurious PMI, but otherwise nothing bad
will happen?

Guests will have nothing to lose, except gaining vPMI accuracy under this proposal.

When a host gets an overflow interrupt caused by a VMRUN, it forwards it to KVM.
KVM does not inject it into the VM, but discards it. For those using PMU to 
profiling
the hypervisor itself, they lose an interrupt or a sample on VMRUN context.

+static inline bool event_is_branch_instruction(struct kvm_pmc *pmc)
+{
+	return eventsel_match_perf_hw_id(pmc, PERF_COUNT_HW_INSTRUCTIONS) ||
+		eventsel_match_perf_hw_id(pmc,
+					  PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
+}
+
+static inline bool quirky_pmc_will_count_vmrun(struct kvm_pmc *pmc)
+{
+	return event_is_branch_instruction(pmc) && event_is_allowed(pmc) &&
+		!static_call(kvm_x86_get_cpl)(pmc->vcpu);

Wait, really?  VMRUN is counted if and only if it enters to a CPL0 guest?  Can
someone from AMD confirm this?  I was going to say we should just treat this as
"normal" behavior, but counting CPL0 but not CPL>0 is definitely quirky.

VMRUN is only counted on a CPL0-target (branch) instruction counter. The VMRUN
is not expected to be counted by the guest counters, regardless of the guest CPL.

This issue makes a guest CPL0-target instruction counter inexplicably increase, 
as if it
would have been under-counted before the virtualization instructions were counted.

Treating the host hypervisor instructions like VMRUN as guest workload instructions
is already an error in itself not "normal" behavior that affects guest accuracy.