Hi Marc, On Sun, 11 Oct 2023 16:45:17 +0000, Marc Zyngier wrote: > > The event processing flow is as follows (shown as backtrace): > > #0 kvm_arch_vcpu_get_frame_pointer / kvm_arch_vcpu_read_virt (per arch) > > #1 kvm_guest_get_frame_pointer / kvm_guest_read_virt > > <callback function pointers in `struct perf_guest_info_callbacks`> > > #2 perf_guest_get_frame_pointer / perf_guest_read_virt > > #3 perf_callchain_guest > > #4 get_perf_callchain > > #5 perf_callchain > > > > Between #0 and #1 is the interface between KVM and the arch-specific > > impl, while between #1 and #2 is the interface between Perf and KVM. > > The 1st patch implements #0. The 2nd patch extends interfaces between #1 > > and #2, while the 3rd patch implements #1. The 4th patch implements #3 > > and modifies #4 #5. The last patch is for userspace utils. > > > > Since arm64 hasn't provided some foundational infrastructure (interface > > for reading from a virtual address of guest), the arm64 implementation > > is stubbed for now because it's a bit complex, and will be implemented > > later. > > I hope you realise that such an "interface" would be, by definition, > fragile and very likely to break in a subtle way. The only existing > case where we walk the guest's page tables is for NV, and even that is > extremely fragile. > For walking the guest's page tables, yes, there're only very few use cases. Most of them are used in nested virtualization and XEN. > Given that, I really wonder why this needs to happen in the kernel. > Userspace has all the required information to interrupt a vcpu and > walk its current context, without any additional kernel support. What > are the bits here that cannot be implemented anywhere else? > Thanks for pointing this out, I agree with your opinion. Whether it's walking guest's contexts or performing an unwind, user space can indeed accomplish these tasks. The only reasons I see for implementing them in the kernel are performance and the access to a broader range of PMU events. Consider if I were to implement these functionalities in userspace: I could have `perf kvm` periodically access the guest through the KVM API to retrieve the necessary information. However, interrupting a VCPU through the KVM API from user space might introduce higher latency (not tested specifically), and the overhead of syscalls could also limit the sampling frequency. Additionally, it seems that user space can only interrupt the VCPU at a certain frequency, without harnessing the richness of the PMU's performance events. And if we incorporate the logic into the kernel, `perf kvm` can bind to various PMU events and sample with a faster performance in PMU interrupts. So, it appears to be a tradeoff -- whether it's necessary to introduce more complexity in the kernel to gain access to a broader range and more precise performance data with less overhead. In my current use case, I just require simple periodic sampling, which is sufficient for me, so I'm open to both approaches. > > Tianyi Liu (5): > > KVM: Add arch specific interfaces for sampling guest callchains > > perf kvm: Introduce guest interfaces for sampling callchains > > KVM: implement new perf interfaces > > perf kvm: Support sampling guest callchains > > perf tools: Support PERF_CONTEXT_GUEST_* flags > > > > arch/arm64/kvm/arm.c | 17 +++++++++ > > Given that there is more to KVM than just arm64 and x86, I suggest > that you move the lack of support for this feature into the main KVM > code. Currently, sampling for KVM guests is only available for the guest's instruction pointer, and even the support is limited, it is available on only two architectures (x86 and arm64). This functionality relies on a kernel configuration option called `CONFIG_GUEST_PERF_EVENTS`, which will only be enabled on x86 and arm64. Within the main KVM code, these interfaces are enclosed within `#ifdef CONFIG_GUEST_PERF_EVENTS`. Do you think these are enough? Best regards, Tianyi Liu