Re: [PATCH v2 bpf-next 2/2] bpf: inline bpf_get_branch_snapshot() helper

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 3, 2024 at 10:51 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Tue, Apr 2, 2024 at 12:05 PM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
> >
> > Inline bpf_get_branch_snapshot() helper using architecture-agnostic
> > inline BPF code which calls directly into underlying callback of
> > perf_snapshot_branch_stack static call. This callback is set early
> > during kernel initialization and is never updated or reset, so it's ok
> > to fetch actual implementation using static_call_query() and call
> > directly into it.
> >
> > This change eliminates a full function call and saves one LBR entry
> > in PERF_SAMPLE_BRANCH_ANY LBR mode.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > ---
> >  kernel/bpf/verifier.c | 55 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 55 insertions(+)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index fcb62300f407..49789da56f4b 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -20157,6 +20157,61 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> >                         goto next_insn;
> >                 }
> >
> > +               /* Implement bpf_get_branch_snapshot inline. */
> > +               if (prog->jit_requested && BITS_PER_LONG == 64 &&
> > +                   insn->imm == BPF_FUNC_get_branch_snapshot) {
> > +                       /* We are dealing with the following func protos:
> > +                        * u64 bpf_get_branch_snapshot(void *buf, u32 size, u64 flags);
> > +                        * int perf_snapshot_branch_stack(struct perf_branch_entry *entries, u32 cnt);
> > +                        */
> > +                       const u32 br_entry_size = sizeof(struct perf_branch_entry);
> > +
> > +                       /* struct perf_branch_entry is part of UAPI and is
> > +                        * used as an array element, so extremely unlikely to
> > +                        * ever grow or shrink
> > +                        */
> > +                       BUILD_BUG_ON(br_entry_size != 24);
> > +
> > +                       /* if (unlikely(flags)) return -EINVAL */
> > +                       insn_buf[0] = BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 0, 7);
>
> optimizing this out with is_reg_const(R3) is probably overkill?
> Just mentioning in case you need to skip this branch.

probably an overkill, we'd need to make sure that any code path has R3
as constant zero.

Also note that these not taken branches don't pollute even the most
detailed PERF_SAMPLE_BRANCH_ANY mode, CPU only records non-linear code
execution changes, so it's fine from my perspective.

>
> > +
> > +                       /* Transform size (bytes) into number of entries (cnt = size / 24).
> > +                        * But to avoid expensive division instruction, we implement
> > +                        * divide-by-3 through multiplication, followed by further
> > +                        * division by 8 through 3-bit right shift.
> > +                        * Refer to book "Hacker's Delight, 2nd ed." by Henry S. Warren, Jr.,
> > +                        * p. 227, chapter "Unsigned Divison by 3" for details and proofs.
> > +                        *
> > +                        * N / 3 <=> M * N / 2^33, where M = (2^33 + 1) / 3 = 0xaaaaaaab.
> > +                        */
> > +                       insn_buf[1] = BPF_MOV32_IMM(BPF_REG_0, 0xaaaaaaab);
> > +                       insn_buf[2] = BPF_ALU64_REG(BPF_REG_2, BPF_REG_0, 0);
> > +                       insn_buf[3] = BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 36);
> > +
> > +                       /* call perf_snapshot_branch_stack implementation */
> > +                       insn_buf[4] = BPF_EMIT_CALL(static_call_query(perf_snapshot_branch_stack));
> > +                       /* if (entry_cnt == 0) return -ENOENT */
> > +                       insn_buf[5] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4);
> > +                       /* return entry_cnt * sizeof(struct perf_branch_entry) */
> > +                       insn_buf[6] = BPF_ALU32_IMM(BPF_MUL, BPF_REG_0, br_entry_size);
> > +                       insn_buf[7] = BPF_JMP_A(3);
> > +                       /* return -EINVAL; */
> > +                       insn_buf[8] = BPF_MOV64_IMM(BPF_REG_0, -EINVAL);
> > +                       insn_buf[9] = BPF_JMP_A(1);
> > +                       /* return -ENOENT; */
> > +                       insn_buf[10] = BPF_MOV64_IMM(BPF_REG_0, -ENOENT);
> > +                       cnt = 11;
> > +
> > +                       new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> > +                       if (!new_prog)
> > +                               return -ENOMEM;
> > +
> > +                       delta    += cnt - 1;
> > +                       env->prog = prog = new_prog;
> > +                       insn      = new_prog->insnsi + i + delta;
> > +                       continue;
> > +               }
> > +
> >                 /* Implement bpf_kptr_xchg inline */
> >                 if (prog->jit_requested && BITS_PER_LONG == 64 &&
> >                     insn->imm == BPF_FUNC_kptr_xchg &&
> > --
> > 2.43.0
> >





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux