On Thu, Apr 25, 2019 at 12:07:06AM +0100, Jiong Wang wrote: > > Alexei Starovoitov writes: > > > Add two tests to check that sequence of 1024 jumps is verifiable. > > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > > --- > > tools/testing/selftests/bpf/test_verifier.c | 70 ++++++++++++++++++++ > > tools/testing/selftests/bpf/verifier/scale.c | 18 +++++ > > I am rebasing 32-bit opt pass on top of latest bpf-next and found these new > tests take more than 20 minutes to run and had not finished after that. > > The reason the following insn filling insde bpf_fill_scale1 is generating > nearly 1M insn whose results are recognized as safe to be poisoned. > > bpf_fill_scale1: > while (i < MAX_TEST_INSNS - 1025) > insn[i++] = BPF_ALU64_IMM(BPF_MOV, BPF_REG_0, 42); > > For each hi32 poisoning, there will be one call to "bpf_patch_insn_data" > which actually is not cheap (adjust jump insns, insn aux info etc). Now, > 1M call to it has exhausted server resources as described, 20minutes running > still not finished. > > For real world applications, we don't do hi32 poisoning, and there isn't much > lo32 zext. Benchmarking those bpf programs inside Cilium shows the final > zext pass adds about 8% ~ 15% verification time. > > The zext pass based on top of "bpf_patch_insn_data" looks more and more is > not the best approach to utilize the read32 analysis results. > > Previously, in v1 cover letter, I listed some of my other thoughts on how to > utilize the liveness analysis results: > > 1 Minor change on back-end JIT hook, also pass aux_insn information to > back-ends so they could have per insn information and they could do > zero extension for the marked insn themselves using the most > efficient native insn. > > 2 Introduce zero extension insn for eBPF. Then verifier could insert > the new zext insn instead of lshift + rshift. zext could be JITed > more efficiently. > > 3 Otherwise JIT back-ends need to do peephole to catch lshift + rshift > and turn them into native zext. all options sounds like hacks to workaround inefficient bpf_patch_insn_data(). Especially option 2 will work only because single insn is replaced with another insn ? Let's fix the algo of bpf_patch_insn_data() instead, so that 1 insn -> 2+ insn is also fast. The main point of bumping the internal limits to 1M and these tests was to expose such algorithmic inefficiencies.