Re: [RFC bpf-next 01/11] bpf: use branch predictions in opt_hard_wire_dead_code_branches()

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 14 Nov 2024 16:17:55 -0800

On Thu, Nov 14, 2024 at 2:20 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Thu, 2024-11-07 at 09:50 -0800, Eduard Zingerman wrote:
> > Consider dead code elimination problem for program like below:
> >
> >     main:
> >       1: r1 = 42
> >       2: call <subprogram>;
> >       3: exit
> >
> >     subprogram:
> >       4: r0 = 1
> >       5: if r1 != 42 goto +1
> >       6: r0 = 2
> >       7: exit;
> >
> > Here verifier would visit every instruction and thus
> > bpf_insn_aux_data->seen flag would be set for both true (7)
> > and falltrhough (6) branches of conditional (5).
> > Hence opt_hard_wire_dead_code_branches() will not replace
> > conditional (5) with unconditional jump.
>
> [...]
>
> Had an off-list discussion with Alexei yesterday,
> here are some answers to questions raised:
> - The patches #1,2 with opt_hard_wire_dead_code_branches() changes are
>   not necessary for dynptr_slice kfunc inlining / branch removal.
>   I will drop these patches and adjust test cases.
> - Did some measurements for dynptr_slice call using simple benchmark
>   from patch #11:
>   - baseline:
>     76.167 ± 0.030M/s million calls per second;
>   - with call inlining, but without branch pruning (only patch #3):
>     101.198 ± 0.101M/s million calls per second;
>   - with call inlining and with branch pruning (full patch-set):
>     116.935 ± 0.142M/s million calls per second.
>

This true/false logic seems generally useful not just for this use
case, is there anything wrong with landing it? Seems pretty
straightforward. I'd split it from the kfunc inlining and land
independently.