On Thu, Aug 11, 2022 at 01:56:21PM +0100, Marc Zyngier wrote: > On Wed, 10 Aug 2022 22:55:03 +0100, > Ricardo Koller <ricarkol@xxxxxxxxxx> wrote: > > > > On Wed, Aug 10, 2022 at 02:33:53PM -0500, Oliver Upton wrote: > > > Hi Ricardo, > > > > > > On Wed, Aug 10, 2022 at 11:46:22AM -0700, Ricardo Koller wrote: > > > > On Fri, Aug 05, 2022 at 02:58:04PM +0100, Marc Zyngier wrote: > > > > > Ricardo recently reported[1] that our PMU emulation was busted when it > > > > > comes to chained events, as we cannot expose the overflow on a 32bit > > > > > boundary (which the architecture requires). > > > > > > > > > > This series aims at fixing this (by deleting a lot of code), and as a > > > > > bonus adds support for PMUv3p5, as this requires us to fix a few more > > > > > things. > > > > > > > > > > Tested on A53 (PMUv3) and FVP (PMUv3p5). > > > > > > > > > > [1] https://lore.kernel.org/r/20220805004139.990531-1-ricarkol@xxxxxxxxxx > > > > > > > > > > Marc Zyngier (9): > > > > > KVM: arm64: PMU: Align chained counter implementation with > > > > > architecture pseudocode > > > > > KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow > > > > > KVM: arm64: PMU: Only narrow counters that are not 64bit wide > > > > > KVM: arm64: PMU: Add counter_index_to_*reg() helpers > > > > > KVM: arm64: PMU: Simplify setting a counter to a specific value > > > > > KVM: arm64: PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creation > > > > > KVM: arm64: PMU: Aleven ID_AA64DFR0_EL1.PMUver to be set from userspace > > > > > KVM: arm64: PMU: Implement PMUv3p5 long counter support > > > > > KVM: arm64: PMU: Aleven PMUv3p5 to be exposed to the guest > > > > > > > > > > arch/arm64/include/asm/kvm_host.h | 1 + > > > > > arch/arm64/kvm/arm.c | 6 + > > > > > arch/arm64/kvm/pmu-emul.c | 372 ++++++++++-------------------- > > > > > arch/arm64/kvm/sys_regs.c | 65 +++++- > > > > > include/kvm/arm_pmu.h | 16 +- > > > > > 5 files changed, 208 insertions(+), 252 deletions(-) > > > > > > > > > > -- > > > > > 2.34.1 > > > > > > > > > > > > > Hi Marc, > > > > > > > > There is one extra potential issue with exposing PMUv3p5. I see this > > > > weird behavior when doing passthrough ("bare metal") on the fast-model > > > > configured to emulate PMUv3p5: the [63:32] half of the counters > > > > overflowing at 32-bits is still incremented. > > > > > > > > Fast model - ARMv8.5: > > > > > > > > Assuming the initial state is even=0xFFFFFFFF and odd=0x0, > > > > incrementing the even counter leads to: > > > > > > > > 0x00000001_00000000 0x00000000_00000001 0x1 > > > > even counter odd counter PMOVSET > > > > > > > > Assuming the initial state is even=0xFFFFFFFF and odd=0xFFFFFFFF, > > > > incrementing the even counter leads to: > > > > > > > > 0x00000001_00000000 0x00000001_00000000 0x3 > > > > even counter odd counter PMOVSET > > > > > > This is to be expected, actually. PMUv8p5 counters are always 64 bit, > > > regardless of the configured overflow. > > > > > > DDI 0487H D8.3 Behavior on overflow > > > > > > If FEAT_PMUv3p5 is implemented, 64-bit event counters are implemented, > > > HDCR.HPMN is not 0, and either n is in the range [0 .. (HDCR.HPMN-1)] > > > or EL2 is not implemented, then event counter overflow is configured > > > by PMCR.LP: > > > > > > — When PMCR.LP is set to 0, if incrementing PMEVCNTR<n> causes an unsigned > > > overflow of bits [31:0] of the event counter, the PE sets PMOVSCLR[n] to 1. > > > — When PMCR.LP is set to 1, if incrementing PMEVCNTR<n> causes an unsigned > > > overflow of bits [63:0] of the event counter, the PE sets PMOVSCLR[n] to 1. > > > > > > [...] > > > > > > For all 64-bit counters, incrementing the counter is the same whether an > > > unsigned overflow occurs at [31:0] or [63:0]. If the counter increments > > > for an event, bits [63:0] are always incremented. > > > > > > Do you see this same (expected) failure w/ Marc's series? > > > > I don't know, I'm hitting another bug it seems. > > > > Just realized that KVM does not offer PMUv3p5 (with this series applied) > > when the real hardware is only Armv8.2 (the setup I originally tried). > > So, tried these other two setups on the fast model: > > > > has_arm_v8-5=1 > > > > # ./lkvm-static run --nodefaults --pmu pmu.flat -p pmu-chained-sw-incr > > # lkvm run -k pmu.flat -m 704 -c 8 --name guest-135 > > > > INFO: PMU version: 0x6 > > ^^^ > > PMUv3 for Armv8.5 > > INFO: PMU implementer/ID code: 0x41("A")/0 > > INFO: Implements 8 event counters > > FAIL: pmu: pmu-chained-sw-incr: overflow and chain counter incremented after 100 SW_INCR/CHAIN > > INFO: pmu: pmu-chained-sw-incr: overflow=0x0, #0=4294967380 #1=0 > > ^^^ > > no overflows > > FAIL: pmu: pmu-chained-sw-incr: expected overflows and values after 100 SW_INCR/CHAIN > > INFO: pmu: pmu-chained-sw-incr: overflow=0x0, #0=84 #1=-1 > > INFO: pmu: pmu-chained-sw-incr: overflow=0x0, #0=4294967380 #1=4294967295 > > SUMMARY: 2 tests, 2 unexpected failures > > Hmm. I think I see what's wrong. In kvm_pmu_create_perf_event(), we > have this: > > if (kvm_pmu_idx_is_64bit(vcpu, select_idx)) > attr.config1 |= 1; > > counter = kvm_pmu_get_counter_value(vcpu, select_idx); > > /* The initial sample period (overflow count) of an event. */ > if (kvm_pmu_idx_has_64bit_overflow(vcpu, select_idx)) > attr.sample_period = (-counter) & GENMASK(63, 0); > else > attr.sample_period = (-counter) & GENMASK(31, 0); > > but the initial sampling period shouldn't be based on the *guest* > counter overflow. It really is about the getting to an overflow on the > *host*, so the initial code was correct, and only the width of the > counter matters here. Right, I think this requires bringing back some of the chained related code (like update_pmc_chained() and pmc_is_chained()), because attr.sample_period = (-counter) & GENMASK(31, 0); should also be used when the counter is chained. Thanks, Ricardo > > /me goes back to running the FVP... > > Thanks, > > M. > > -- > Without deviation from the norm, progress is not possible.