Re: PMU virtualization and AMD erratum 1292

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15/1/2022 4:02 am, Jim Mattson wrote:
 From AMD erratum 1292:

I see quite a few errata in AMD's products in terms of PMU counters.

Considering the number of this type of machines in real world,
there is a real need to think about it. Thanks for pointing out.


The processor may experience sampling inaccuracies that cause the
following performance counters to overcount retire-based events.
  • PMCx0C0 [Retired Instructions]
  • PMCx0C1 [Retired Uops]
  • PMCx0C2 [Retired Branch Instructions]
  • PMCx0C3 [Retired Branch Instructions Mispredicted]
  • PMCx0C4 [Retired Taken Branch Instructions]
  • PMCx0C5 [Retired Taken Branch Instructions Mispredicted]
  • PMCx0C8 [Retired Near Returns]
  • PMCx0C9 [Retired Near Returns Mispredicted]
  • PMCx0CA [Retired Indirect Branch Instructions Mispredicted]
• PMCx0CC [Retired Indirect Branch Instructions]
  • PMCx0D1 [Retired Conditional Branch Instructions]
  • PMCx1C7 [Retired Mispredicted Branch Instructions due to Direction Mismatch]
  • PMCx1D0 [Retired Fused Branch Instructions]

The recommended workaround is:

Or to set the BIOS Setup Option "IBS hardware workaround."
(not recommended for production due to negative performance impact)


To count the non-FP affected PMC events correctly:
  • Use Core::X86::Msr::PERF_CTL2 to count the events, and
  • Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
  • Program Core::X86::Msr::PERF_CTL2[20] to 0b.

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 12d8b301065a..6a7638043066 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -18,6 +18,13 @@
 #include "pmu.h"
 #include "svm.h"

+/* AMD erratum 1292 */
+static inline bool cpu_overcount_retire_events(struct kvm_vcpu *vcpu)
+{
+	return guest_cpuid_family(vcpu) == 0x19 &&
+		guest_cpuid_model(vcpu) < 0x10;
+}
+
 enum pmu_type {
 	PMU_TYPE_COUNTER = 0,
 	PMU_TYPE_EVNTSEL,
@@ -252,6 +259,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	struct kvm_pmc *pmc;
 	u32 msr = msr_info->index;
 	u64 data = msr_info->data;
+	u64 reserved_bits = pmu->reserved_bits;

 	/* MSR_PERFCTRn */
 	pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
@@ -264,7 +272,9 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	if (pmc) {
 		if (data == pmc->eventsel)
 			return 0;
-		if (!(data & pmu->reserved_bits)) {
+		if (pmc->idx == 2 && cpu_overcount_retire_events(vcpu))
+			reserved_bits &= ~BIT_ULL(43);
+		if (!(data & reserved_bits)) {
 			reprogram_gp_counter(pmc, data);
 			return 0;
 		}


It's unfortunate that kvm's PMU virtualization completely circumvents
any attempt to employ the recommended workaround. Admittedly, bit 43
is "reserved," and it would be foolish for a hypervisor to let a guest
set a reserved bit in a host MSR.

It's easy for KVM to clear the reserved bit PERF_CTL2[43]
for only (AMD Family 19h Models 00h-0Fh) guests.

Obviously, such guests need to be updated and the reserved bit can
be accessed safely. Don't worry about the legacy guest, see below.

But, even the first recommendation
is impossible under KVM, because the host's perf subsystem actually
decides which hardware counter is going to be used, regardless of what
the guest asks for.

First, the host perf subsystem needs to be patched to implement this workaround.
 (AMD guys have been notified)

The patched host perf will schedule all retire events to counter 2 as long as
the requested event_select and unit_mask are matched in the workaround table.

It works for both host-created perf_events and KVM-created perf_events, so that
all legacy (retire event) guests counters will use the specific host counter 2 and,
the sampling (w/o host counter multiplexing) will be kept accurate.


Am I the only one bothered by this?
With this workaround, it is easier to trigger multiplexing, which the guest
does not correctly perceive even now.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux