This series of patches enables callchains for guests (used by `perf kvm`), which holds the top spot on the perf wiki TODO list [1]. This allows users to perform guest OS callchain or performance analysis from external using PMU events. This is also useful for guests like unikernels that lack performance event subsystems. The event processing flow is as follows (shown as backtrace): @0 kvm_arch_vcpu_get_unwind_info / kvm_arch_vcpu_read_virt (per arch impl) @1 kvm_guest_get_unwind_info / kvm_guest_read_virt <callback function pointers in `struct perf_guest_info_callbacks`> @2 perf_guest_get_unwind_info / perf_guest_read_virt @3 perf_callchain_guest @4 get_perf_callchain @5 perf_callchain Between @0 and @1 is the interface between KVM and the arch-specific impl, while between @1 and @2 is the interface between Perf and KVM. The 1st patch implements @0. The 2nd patch extends interfaces between @1 and @2, while the 3rd patch implements @1. The 4th patch implements @3 and modifies @4 @5. The last patch is for userspace tools. Since arm64 hasn't provided some foundational infrastructure (interface for reading from a virtual address of guest), the arm64 implementation is stubbed for now because it's a bit complex, and will be implemented later. For safety, guests are designed to be read-only in this feature, and we will never inject page faults into the guests, ensuring that the guests are not interfered by profiling. In extremely rare cases, if the guest is modifying the page table, there is a possibility of reading incorrect data. Additionally, if certain programs running in the guest OS do not support frame pointers, it may also result in some erroneous data. These erroneous data will eventually appear as `[unknown]` entries in the report. It is sufficient as long as most of the records are correct for profiling. Regarding the necessity of implementing in the kernel: Indeed, we could implement this in userspace and access the guest vm through the KVM APIs, to interrupt the guest and perform unwinding. However, this approach will introduce higher latency, and the overhead of syscalls could limit the sampling frequency. Moreover, it appears that user space can only interrupt the VCPU at a certain frequency, without fully leveraging the richness of the PMU's performance events. On the other hand, if we incorporate the logic into kernel, `perf kvm` can bind to various PMU events and achieve faster performance in PMU interrupts. Tested with both Linux and unikernels as guests, the `perf script` command could correctly show the callchains. FlameGraphs could also be generated with this series of patches and [2]. [1] https://perf.wiki.kernel.org/index.php/Todo [2] https://github.com/brendangregg/FlameGraph v1: https://lore.kernel.org/kvm/SYYP282MB108686A73C0F896D90D246569DE5A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ Changes since v1: Post the complete implementation, also updated some code based on Sean's feedback. v2: https://lore.kernel.org/kvm/SY4P282MB1084ECBCC1B176153B9E2A009DCFA@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ Changes since v2: Refactored interface, packaged the info required by unwinding into a struct; Resolved some type mismatches; Provided more explanations based on the feedback from v2; more tests were performed. Tianyi Liu (5): KVM: Add arch specific interfaces for sampling guest callchains perf kvm: Introduce guest interfaces for sampling callchains KVM: implement new perf callback interfaces perf kvm: Support sampling guest callchains perf tools: Support PERF_CONTEXT_GUEST_* flags MAINTAINERS | 1 + arch/arm64/kvm/arm.c | 12 ++++++ arch/x86/events/core.c | 63 ++++++++++++++++++++++++----- arch/x86/kvm/x86.c | 24 +++++++++++ include/linux/kvm_host.h | 5 +++ include/linux/perf_event.h | 20 ++++++++- include/linux/perf_kvm.h | 18 +++++++++ kernel/bpf/stackmap.c | 8 ++-- kernel/events/callchain.c | 27 ++++++++++++- kernel/events/core.c | 17 +++++++- tools/perf/builtin-timechart.c | 6 +++ tools/perf/util/data-convert-json.c | 6 +++ tools/perf/util/machine.c | 6 +++ virt/kvm/kvm_main.c | 22 ++++++++++ 14 files changed, 218 insertions(+), 17 deletions(-) create mode 100644 include/linux/perf_kvm.h base-commit: 33cc938e65a98f1d29d0a18403dbbee050dcad9a -- 2.34.1