Implement inlining of bpf_get_branch_snapshot() BPF helper using generic BPF assembly approach. This allows to reduce LBR record usage right before LBR records are captured from inside BPF program. See v1 cover letter ([0]) for some visual examples. I dropped them from v2 because there are multiple independent changes landing and being reviewed, all of which remove different parts of LBR record waste, so presenting final state of LBR "waste" gets more complicated until all of the pieces land. [0] https://lore.kernel.org/bpf/20240321180501.734779-1-andrii@xxxxxxxxxx/ v2->v3: - fix BPF_MUL instruction definition; v1->v2: - inlining of bpf_get_smp_processor_id() split out into a separate patch set implementing internal per-CPU BPF instruction; - add efficient divide-by-24 through multiplication logic, and leave comments to explain the idea behind it; this way inlined version of bpf_get_branch_snapshot() has no compromises compared to non-inlined version of the helper (Alexei). Andrii Nakryiko (2): bpf: make bpf_get_branch_snapshot() architecture-agnostic bpf: inline bpf_get_branch_snapshot() helper kernel/bpf/verifier.c | 55 ++++++++++++++++++++++++++++++++++++++++ kernel/trace/bpf_trace.c | 4 --- 2 files changed, 55 insertions(+), 4 deletions(-) -- 2.43.0