(CC'ing Peter Maydell in case this might be of interest to qemu) The series can be found at [x], and the kvmtool support at [2]. At the moment, the experience of running a virtual machine with a PMU on a heterogeneous systems (where there are different PMUs), varies from just works, if the VCPUs run only on the correct physical CPUs, to doesn't work at all, if the VCPUs run only on the incorrect physical CPUs, to something doesn't look right, if the VCPUs run some of the time on the correct physical CPUs, and some of the time on the incorrect physical CPUs. The reason for this behaviour is that KVM creates perf events to emulate a guest PMU, and the choice of PMU that is used to create these events is left entirely up to the perf core system, based on the hardware probe order. The first PMU to register with perf (via perf_pmu_register()) is the PMU that will always be chosen when creating the events (in perf_event_create_kernel_counter() -> perf_event_alloc() -> perf_event_init()). Let's take the example of a rockpro64 board, CPUs 0-3 are Cortex-A53 (the little cores), CPUs 4-5 are Cortex-A72 (the big cores), and each group has their own PMU. When running the pmu-cycle-counter test from kvm-unit-tests on the little cores, everything is working as expected: taskset -c 0-3 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu [..] PASS: pmu: cycle-counter: Monotonically increasing cycle count [..] PASS: pmu: cycle-counter: Cycle/instruction ratio SUMMARY: 2 tests But when running the same test on the big cores: $ taskset -c 4-5 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu [..] FAIL: pmu: cycle-counter: Monotonically increasing cycle count [..] FAIL: pmu: cycle-counter: Cycle/instruction ratio SUMMARY: 2 tests, 2 unexpected failures The same behaviour is exhibited when running under qemu. The test passes on the little cores in that particular setup because the little cores are the "correct" cores: the PMU that perf chooses to create the events on is the PMU associated with the little cores. The test fails on the big cores because the events cannot be scheduled in, as the PMU is associated with a different set of cores (merge_sched_in() exits early because event_filter_match() returns false). It gets even more impredicatable, as the order in which the PMUs are probed during boot dictates which PMU is chosen for creating the events, and the probe order can change if, for example, the order of the PMU nodes in the DTB changes, or if the kernel is booted with asynchronous driver probing for the armv8-pmu driver. A user can end up in a situation where pinning the VM on a set of CPUs works just fine, and after a reboot doesn't work anymore, without any kind of explanation or hints of why it stopped working. All of this is not ideal from the user perspective and this series aims to improve that by adding a new PMU attribute which can be used to tell KVM exactly on which PMU events for the VCPU should be created. The contract is that user is still responsible for pinning the VCPUs on the corresponding CPUs, and KVM will refuse to run the VCPU on a CPU with a different PMU. With this series on top of kvmtool support for KVM_ARM_VCPU_PMU_V3_SET_PMU attribute [2], running the same test as above on the little cores, then on the big cores: $ taskset -c 0-3 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu [..] PASS: pmu: cycle-counter: Monotonically increasing cycle count [..] PASS: pmu: cycle-counter: Cycle/instruction ratio SUMMARY: 2 tests $ taskset -c 4-5 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu [..] PASS: pmu: cycle-counter: Monotonically increasing cycle count [..] PASS: pmu: cycle-counter: Cycle/instruction ratio SUMMARY: 2 tests We get a saner behaviour, which is reproducible across reboots, regardless of the probe order. And this is what happens if the VCPU is run on a physical PMU with a different PMU than what was set by userspace: $ taskset -c 3-4 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu KVM_RUN failed: Exec format error kvmtool sets the PMU for all VCPUs from the main thread; the main thread runs on the little core (CPU3), but the VCPU is scheduled on the big core (CPU4); there is a mismatch between the VCPU PMU and the physical CPU PMU, and KVM returns -ENOEXEC from KVM_RUN. [1] https://gitlab.arm.com/linux-arm/linux-ae/-/tree/pmu-big-little-fix-v1 [2] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/pmu-big-little-fix-v1 Alexandru Elisei (4): perf: Fix wrong name in comment for struct perf_cpu_context KVM: arm64: Keep a list of probed PMUs KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU Documentation/virt/kvm/api.rst | 5 ++- Documentation/virt/kvm/devices/vcpu.rst | 26 +++++++++++ arch/arm64/include/asm/kvm_host.h | 3 ++ arch/arm64/include/uapi/asm/kvm.h | 1 + arch/arm64/kvm/arm.c | 15 +++++++ arch/arm64/kvm/pmu-emul.c | 58 +++++++++++++++++++++++-- include/kvm/arm_pmu.h | 6 +++ include/linux/perf_event.h | 2 +- tools/arch/arm64/include/uapi/asm/kvm.h | 1 + 9 files changed, 110 insertions(+), 7 deletions(-) -- 2.33.1 _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm