Hi Maxim, At 2023-10-10 16:12 +0000, Maxim Levitsky wrote: > > +static inline void > > +perf_callchain_guest32(struct perf_callchain_entry_ctx *entry) > > +{ > > + struct stack_frame_ia32 frame; > > + const struct stack_frame_ia32 *fp; > > + > > + fp = (void *)perf_guest_get_frame_pointer(); > > + while (fp && entry->nr < entry->max_stack) { > > + if (!perf_guest_read_virt(&fp->next_frame, &frame.next_frame, > This should be fp->next_frame. > > + sizeof(frame.next_frame))) > > + break; > > + if (!perf_guest_read_virt(&fp->return_address, &frame.return_address, > Same here. > > + sizeof(frame.return_address))) > > + break; > > + perf_callchain_store(entry, frame.return_address); > > + fp = (void *)frame.next_frame; > > + } > > +} > > + The address space where `fp` resides here is in the guest memory, not in the directly accessible kernel address space. `&fp->next_frame` and `&fp->return_address` are simply calculating address offsets in a more readable manner, much like `fp + 0` and `fp + 4`. The original implementation of `perf_callchain_user` and `perf_callchain_user32` also use this approach [1]. > > For symmetry, maybe it makes sense to have perf_callchain_guest32 and perf_callchain_guest64 > and then make perf_callchain_guest call each? No strong opinion on this of course. > The `perf_callchain_guest` and `perf_callchain_guest32` here are simply designed to mimic `perf_callchain_user` and `perf_callchain_user32` [2]. I'm also open to make the logic fully separate, if this doesn't seem elegant enough. [1] https://github.com/torvalds/linux/blob/master/arch/x86/events/core.c#L2890 [2] https://github.com/torvalds/linux/blob/master/arch/x86/events/core.c#L2820 Best regards, Tianyi Liu