Hi Ricardo, Marc, On 8/5/22 02:41, Ricardo Koller wrote: > There are some tests that fail when running on bare metal (including a > passthrough prototype). There are three issues with the tests. The > first one is that there are some missing isb()'s between enabling event > counting and the actual counting. This wasn't an issue on KVM as > trapping on registers served as context synchronization events. The > second issue is that some tests assume that registers reset to 0. And > finally, the third issue is that overflowing the low counter of a > chained event sets the overflow flag in PMVOS and some tests fail by > checking for it not being set. > > Addressed all comments from the previous version: > https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@xxxxxxxxxx/T/#t > - adding missing isb() and fixed the commit message (Alexandru). > - fixed wording of a report() check (Andrew). > > Thanks! > Ricardo > > Ricardo Koller (3): > arm: pmu: Add missing isb()'s after sys register writing > arm: pmu: Reset the pmu registers before starting some tests > arm: pmu: Check for overflow in the low counter in chained counters > tests > > arm/pmu.c | 56 ++++++++++++++++++++++++++++++++++++++----------------- > 1 file changed, 39 insertions(+), 17 deletions(-) > While testing this series and the related '[PATCH 0/9] KVM: arm64: PMU: Fixing chained events, and PMUv3p5 support' I noticed I have kvm unit test failures on some machines. This does not seem related to those series though since I was able to get them without. The failures happen on Amberwing machine for instance with the pmu-chain-promotion. While further investigating I noticed there is a lot of variability on the kvm unit test mem_access_loop() count. I can get the counter = 0x1F on the first iteration and 0x96 on the subsequent ones for instance. While running mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E) I was expecting the counter to be close to 20. It is on some HW. for (int i = 0; i < 20; i++) { write_regn_el0(pmevtyper, 0, MEM_ACCESS | PMEVTYPER_EXCLUDE_EL0); write_sysreg_s(0x1, PMCNTENSET_EL0); write_regn_el0(pmevcntr, 0, 0); isb(); mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E); isb(); report_info("iter %d, MEM_ACCESS counter #0 has value 0x%lx", i, read_regn_el0(pmevcntr, 0)); write_sysreg_s(0x0, PMCNTENCLR_EL0); } [I know there are some useless isb's by the way but that's just debug code.] gives INFO: PMU version: 0x1 INFO: PMU implementer/ID code: 0x51("Q")/0 INFO: Implements 8 event counters INFO: pmu: pmu-chain-promotion: iter 0, MEM_ACCESS counter #0 has value 0x1f INFO: pmu: pmu-chain-promotion: iter 1, MEM_ACCESS counter #0 has value 0x96 <--- ? INFO: pmu: pmu-chain-promotion: iter 2, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 3, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 4, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 5, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 6, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 7, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 8, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 9, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 10, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 11, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 12, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 13, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 14, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 15, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 16, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 17, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 18, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 19, MEM_ACCESS counter #0 has value 0x96 If I run the following sequence before the previous one: for (int i = 0; i < 20; i++) { write_regn_el0(pmevtyper, 0, SW_INCR | PMEVTYPER_EXCLUDE_EL0); write_sysreg_s(0x1, PMCNTENSET_EL0); write_regn_el0(pmevcntr, 0, 0); set_pmcr(pmu.pmcr_ro | PMU_PMCR_E); for (int j = 0; j < 20; j++) write_sysreg(0x1, pmswinc_el0); set_pmcr(pmu.pmcr_ro); report_info("iter %d, 20 x SW_INCRs counter #0 has value 0x%lx", i, read_regn_el0(pmevcntr, 0)); write_sysreg_s(0x0, PMCNTENCLR_EL0); } I get INFO: pmu: pmu-chain-promotion: iter 0, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 1, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 2, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 3, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 4, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 5, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 6, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 7, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 8, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 9, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 10, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 11, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 12, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 13, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 14, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 15, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 16, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 17, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 18, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 19, 20 x SW_INCRs counter #0 has value 0x14 INFO: pmu: pmu-chain-promotion: iter 0, MEM_ACCESS counter #0 has value 0x96 <--- INFO: pmu: pmu-chain-promotion: iter 1, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 2, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 3, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 4, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 5, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 6, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 7, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 8, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 9, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 10, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 11, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 12, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 13, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 14, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 15, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 16, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 17, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 18, MEM_ACCESS counter #0 has value 0x96 INFO: pmu: pmu-chain-promotion: iter 19, MEM_ACCESS counter #0 has value 0x96 So I come to the actual question. Can we do any assumption on the (virtual) PMU quality/precision? If not, the tests I originally wrote are damned to fail on some HW (on some other they always pass) and I need to make a decision wrt re-writing part of them, expecially those which expect overflow after a given amount of ops. Otherwise, there is either something wrong in the test (asm?) or in KVM PMU emulation. I tried to bisect because I did observe the same behavior on some older kernels but the bisect was not successful as the issue does not happen always. Thoughts? Eric