On Tue, Feb 13, 2024 at 04:23:11PM -0800, Andrii Nakryiko wrote: > Real-world BPF applications keep growing in size. Medium-sized production > application can easily have 50K+ verified instructions, and its line > info section in .BTF.ext has more than 3K entries. > > When verifier emits log with log_level>=1, it annotates assembly code > with matched original C source code. Currently it uses linear search > over line info records to find a match. As complexity of BPF > applications grows, this O(K * N) approach scales poorly. > > So, let's instead of linear O(N) search for line info record use faster > equivalent O(log(N)) binary search algorithm. It's not a plain binary > search, as we don't look for exact match. It's an upper bound search > variant, looking for rightmost line info record that starts at or before > given insn_off. > > Some unscientific measurements were done before and after this change. > They were done in VM and fluctuate a bit, but overall the speed up is > undeniable. > > BASELINE > ======== > File Program Duration (us) Insns > -------------------------------- ---------------- ------------- ------ > katran.bpf.o balancer_ingress 2497130 343552 > pyperf600.bpf.linked3.o on_event 12389611 627288 > strobelight_pyperf_libbpf.o on_py_event 387399 52445 > -------------------------------- ---------------- ------------- ------ > > BINARY SEARCH > ============= > > File Program Duration (us) Insns > -------------------------------- ---------------- ------------- ------ > katran.bpf.o balancer_ingress 2339312 343552 > pyperf600.bpf.linked3.o on_event 5602203 627288 > strobelight_pyperf_libbpf.o on_py_event 294761 52445 > -------------------------------- ---------------- ------------- ------ > > While Katran's speed up is pretty modest (about 105ms, or 6%), for > production pyperf BPF program (on_py_event) it's much greater already, > going from 387ms down to 295ms (23% improvement). > > Looking at BPF selftests's biggest pyperf example, we can see even more > dramatic improvement, shaving more than 50% of time, going from 12.3s > down to 5.6s. nice speedup Acked-by: Jiri Olsa <jolsa@xxxxxxxxxx> jirka > > Different amount of improvement is the function of overall amount of BPF > assembly instructions in .bpf.o files (which contributes to how much > line info records there will be and thus, on average, how much time linear > search will take), among other things: > > $ llvm-objdump -d katran.bpf.o | wc -l > 3863 > $ llvm-objdump -d strobelight_pyperf_libbpf.o | wc -l > 6997 > $ llvm-objdump -d pyperf600.bpf.linked3.o | wc -l > 87854 > > Granted, this only applies to debugging cases (e.g., using veristat, or > failing verification in production), but seems worth doing to improve > overall developer experience anyways. > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > --- > kernel/bpf/log.c | 30 +++++++++++++++++++++++++----- > 1 file changed, 25 insertions(+), 5 deletions(-) > > diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c > index 594a234f122b..2dfac246a27d 100644 > --- a/kernel/bpf/log.c > +++ b/kernel/bpf/log.c > @@ -333,7 +333,8 @@ find_linfo(const struct bpf_verifier_env *env, u32 insn_off) > { > const struct bpf_line_info *linfo; > const struct bpf_prog *prog; > - u32 i, nr_linfo; > + u32 nr_linfo; > + int l, r, m; > > prog = env->prog; > nr_linfo = prog->aux->nr_linfo; > @@ -342,11 +343,30 @@ find_linfo(const struct bpf_verifier_env *env, u32 insn_off) > return NULL; > > linfo = prog->aux->linfo; > - for (i = 1; i < nr_linfo; i++) > - if (insn_off < linfo[i].insn_off) > - break; > + /* Loop invariant: linfo[l].insn_off <= insns_off. > + * linfo[0].insn_off == 0 which always satisfies above condition. > + * Binary search is searching for rightmost linfo entry that satisfies > + * the above invariant, giving us the desired record that covers given > + * instruction offset. > + */ > + l = 0; > + r = nr_linfo - 1; > + while (l < r) { > + /* (r - l + 1) / 2 means we break a tie to the right, so if: > + * l=1, r=2, linfo[l].insn_off <= insn_off, linfo[r].insn_off > insn_off, > + * then m=2, we see that linfo[m].insn_off > insn_off, and so > + * r becomes 1 and we exit the loop with correct l==1. > + * If the tie was broken to the left, m=1 would end us up in > + * an endless loop where l and m stay at 1 and r stays at 2. > + */ > + m = l + (r - l + 1) / 2; > + if (linfo[m].insn_off <= insn_off) > + l = m; > + else > + r = m - 1; > + } > > - return &linfo[i - 1]; > + return &linfo[l]; > } > > static const char *ltrim(const char *s) > -- > 2.39.3 > >