On Thu, Nov 14, 2024 at 4:17 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Thu, Nov 14, 2024 at 2:20 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > > > On Thu, 2024-11-07 at 09:50 -0800, Eduard Zingerman wrote: > > > Consider dead code elimination problem for program like below: > > > > > > main: > > > 1: r1 = 42 > > > 2: call <subprogram>; > > > 3: exit > > > > > > subprogram: > > > 4: r0 = 1 > > > 5: if r1 != 42 goto +1 > > > 6: r0 = 2 > > > 7: exit; > > > > > > Here verifier would visit every instruction and thus > > > bpf_insn_aux_data->seen flag would be set for both true (7) > > > and falltrhough (6) branches of conditional (5). > > > Hence opt_hard_wire_dead_code_branches() will not replace > > > conditional (5) with unconditional jump. > > > > [...] > > > > Had an off-list discussion with Alexei yesterday, > > here are some answers to questions raised: > > - The patches #1,2 with opt_hard_wire_dead_code_branches() changes are > > not necessary for dynptr_slice kfunc inlining / branch removal. > > I will drop these patches and adjust test cases. > > - Did some measurements for dynptr_slice call using simple benchmark > > from patch #11: > > - baseline: > > 76.167 ± 0.030M/s million calls per second; > > - with call inlining, but without branch pruning (only patch #3): > > 101.198 ± 0.101M/s million calls per second; > > - with call inlining and with branch pruning (full patch-set): > > 116.935 ± 0.142M/s million calls per second. > > > > This true/false logic seems generally useful not just for this use > case, is there anything wrong with landing it? Seems pretty > straightforward. I'd split it from the kfunc inlining and land > independently. I was also always hoping that we'll eventually optimize the following pattern: r1 = *(global var) if r1 == 1 /* always 1 or 0 */ goto +... ... This is extremely common with .rodata global variables, and while the branches are dead code eliminated, memory reads are not. Not sure how involved it would be to do this.