On Sun, Dec 10, 2023, Tianyi Liu wrote: > This patch provides support for sampling guests' callchains. > > The signature of `get_perf_callchain` has been modified to explicitly > specify whether it needs to sample the host or guest callchain. Based on > the context, `get_perf_callchain` will distribute each sampling request > to one of `perf_callchain_user`, `perf_callchain_kernel`, > or `perf_callchain_guest`. > > The reason for separately implementing `perf_callchain_user` and > `perf_callchain_kernel` is that the kernel may utilize special unwinders > like `ORC`. However, for the guest, we only support stackframe-based > unwinding, so the implementation is generic and only needs to be > separately implemented for 32-bit and 64-bit. > > Signed-off-by: Tianyi Liu <i.pear@xxxxxxxxxxx> > --- > arch/x86/events/core.c | 63 ++++++++++++++++++++++++++++++++------ > include/linux/perf_event.h | 3 +- > kernel/bpf/stackmap.c | 8 ++--- > kernel/events/callchain.c | 27 +++++++++++++++- > kernel/events/core.c | 7 ++++- > 5 files changed, 91 insertions(+), 17 deletions(-) > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c > index 40ad1425ffa2..4ff412225217 100644 > --- a/arch/x86/events/core.c > +++ b/arch/x86/events/core.c > @@ -2758,11 +2758,6 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re > struct unwind_state state; > unsigned long addr; > > - if (perf_guest_state()) { > - /* TODO: We don't support guest os callchain now */ > - return; > - } > - > if (perf_callchain_store(entry, regs->ip)) > return; > > @@ -2778,6 +2773,59 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re > } > } > > +static inline void > +perf_callchain_guest32(struct perf_callchain_entry_ctx *entry, > + const struct perf_kvm_guest_unwind_info *unwind_info) > +{ > + unsigned long ss_base, cs_base; > + struct stack_frame_ia32 frame; > + const struct stack_frame_ia32 *fp; > + > + cs_base = unwind_info->segment_cs_base; > + ss_base = unwind_info->segment_ss_base; > + > + fp = (void *)(ss_base + unwind_info->frame_pointer); > + while (fp && entry->nr < entry->max_stack) { > + if (!perf_guest_read_virt((unsigned long)&fp->next_frame, This is extremely confusing and potentially dangerous. ss_base and unwind_info->frame_pointer are *guest* SS:RBP, i.e. this is referencing a guest virtual address. It works, but it _looks_ like the code is fully dereferencing a guest virtual address in the hose kernel. And I can only imagine what type of speculative accesses this generates. *If* we want to support guest callchains, I think it would make more sense to have a single hook for KVM/virtualization to fill perf_callchain_entry_ctx. Then there's no need for "struct perf_kvm_guest_unwind_info", perf doesn't need a hook to read guest memory, and KVM can decide/control what to do with respect to mitigating speculatiion issues. > + &frame.next_frame, sizeof(frame.next_frame))) > + break; > + if (!perf_guest_read_virt((unsigned long)&fp->return_address, > + &frame.return_address, sizeof(frame.return_address))) > + break; > + perf_callchain_store(entry, cs_base + frame.return_address); > + fp = (void *)(ss_base + frame.next_frame); > + } > +}