On Wed, Feb 23, 2022 at 5:10 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Wed, Feb 23, 2022 at 4:33 PM Hao Luo <haoluo@xxxxxxxxxx> wrote: > > > > On Wed, Feb 23, 2022 at 4:11 PM Andrii Nakryiko > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > On Wed, Feb 23, 2022 at 4:05 PM Hao Luo <haoluo@xxxxxxxxxx> wrote: > > > > > > > > For binaries that are statically linked, consecutive stack frames are > > > > likely to be in the same VMA and therefore have the same build id. > > > > As an optimization for this case, we can cache the previous frame's > > > > VMA, if the new frame has the same VMA as the previous one, reuse the > > > > previous one's build id. We are holding the MM locks as reader across > > > > the entire loop, so we don't need to worry about VMA going away. > > > > > > > > Tested through "stacktrace_build_id" and "stacktrace_build_id_nmi" in > > > > test_progs. > > > > > > > > Suggested-by: Greg Thelen <gthelen@xxxxxxxxxx> > > > > Signed-off-by: Hao Luo <haoluo@xxxxxxxxxx> > > > > --- > > > > > > LGTM. Can you share performance numbers before and after? > > > > > > Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > > > > > > Thanks Andrii. > > > > On a real-world workload, we observed that 66% of cpu cycles in > > __bpf_get_stackid() were spent on build_id_parse() and find_vma(). > > This was before. > > > > We haven't evaluated the performance with this patch yet. This > > optimization seems straightforward, so we plan to upstream it first > > and then retest. > > Ok, once it lands upstream, I'd really appreciate if you can retest > and update us with numbers. Thanks! Sure, will do that.