Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh

Like Xu <like.xu.linux@xxxxxxxxx> · Wed, 9 Feb 2022 18:18:45 +0800

On 4/2/2022 9:01 pm, Jim Mattson wrote:
On Fri, Feb 4, 2022 at 1:33 AM Ravi Bangoria <ravi.bangoria@xxxxxxx> wrote:

On 03-Feb-22 11:25 PM, Jim Mattson wrote:
On Wed, Feb 2, 2022 at 9:18 PM Ravi Bangoria <ravi.bangoria@xxxxxxx> wrote:

Hi Jim,

On 03-Feb-22 9:39 AM, Jim Mattson wrote:
On Wed, Feb 2, 2022 at 2:52 AM Ravi Bangoria <ravi.bangoria@xxxxxxx> wrote:

Perf counter may overcount for a list of Retire Based Events. Implement
workaround for Zen3 Family 19 Model 00-0F processors as suggested in
Revision Guide[1]:

   To count the non-FP affected PMC events correctly:
     o Use Core::X86::Msr::PERF_CTL2 to count the events, and
     o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
     o Program Core::X86::Msr::PERF_CTL2[20] to 0b.

Note that the specified workaround applies only to counting events and
not to sampling events. Thus sampling event will continue functioning
as is.

Although the issue exists on all previous Zen revisions, the workaround
is different and thus not included in this patch.

This patch needs Like's patch[2] to make it work on kvm guest.

IIUC, this patch along with Like's patch actually breaks PMU
virtualization for a kvm guest.

Suppose I have some code which counts event 0xC2 [Retired Branch
Instructions] on PMC0 and event 0xC4 [Retired Taken Branch
Instructions] on PMC1. I then divide PMC1 by PMC0 to see what
percentage of my branch instructions are taken. On hardware that
suffers from erratum 1292, both counters may overcount, but if the
inaccuracy is small, then my final result may still be fairly close to
reality.

With these patches, if I run that same code in a kvm guest, it looks
like one of those events will be counted on PMC2 and the other won't
be counted at all. So, when I calculate the percentage of branch
instructions taken, I either get 0 or infinity.

Events get multiplexed internally. See below quick test I ran inside
guest. My host is running with my+Like's patch and guest is running
with only my patch.

Your guest may be multiplexing the counters. The guest I posited does not.

It would be helpful if you can provide an example.

Perf on any current Linux distro (i.e. without your fix).

The patch for errata #1292 (like most hw issues or vulnerabilities) should be
applied to both the host and guest.

For non-patched guests on a patched host, the KVM-created perf_events
will be true for is_sampling_event() due to get_sample_period().

I think we (KVM) have a congenital defect in distinguishing whether guest
counters are used in counting mode or sampling mode, which is just
a different use of pure software.

I hope that you are not saying that kvm's *thread-pinned* perf events
are not being multiplexed at the host level, because that completely
breaks PMU virtualization.

IIUC, multiplexing happens inside the guest.

I'm not sure that multiplexing is the answer. Extrapolation may
introduce greater imprecision than the erratum.

If you run the same test on the patched host, the PMC2 will be
used in a multiplexing way. This is no different.

If you count something like "instructions retired" three ways:
1) Unfixed counter
2) PMC2 with the fix
3) Multiplexed on PMC2 with the fix

Is (3) always more accurate than (1)?

The loss of accuracy is due to a reduction in the number of trustworthy counters,
not to these two workaround patches. Any multiplexing (whatever on the host or
the guest) will result in a loss of accuracy. Right ?

I'm not sure if we should provide a sysfs knob for (1), is there a precedent for 
this ?

Thanks,
Ravi