Re: 32-bit zext time complexity (Was Re: [PATCH bpf-next] selftests/bpf: two scale tests)

Jiong Wang <jiong.wang@xxxxxxxxxxxxx> · Thu, 25 Apr 2019 08:25:44 +0100

Alexei Starovoitov writes:

> On Thu, Apr 25, 2019 at 12:07:06AM +0100, Jiong Wang wrote:
>> 
>> Alexei Starovoitov writes:
>> 
>> > Add two tests to check that sequence of 1024 jumps is verifiable.
>> >
>> > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
>> > ---
>> >  tools/testing/selftests/bpf/test_verifier.c  | 70 ++++++++++++++++++++
>> >  tools/testing/selftests/bpf/verifier/scale.c | 18 +++++
>> 
>> I am rebasing 32-bit opt pass on top of latest bpf-next and found these new
>> tests take more than 20 minutes to run and had not finished after that.
>> 
>> The reason the following insn filling insde bpf_fill_scale1 is generating
>> nearly 1M insn whose results are recognized as safe to be poisoned.
>> 
>>   bpf_fill_scale1:
>>     while (i < MAX_TEST_INSNS - 1025)
>>       insn[i++] = BPF_ALU64_IMM(BPF_MOV, BPF_REG_0, 42);
>> 
>> For each hi32 poisoning, there will be one call to "bpf_patch_insn_data"
>> which actually is not cheap (adjust jump insns, insn aux info etc). Now,
>> 1M call to it has exhausted server resources as described, 20minutes running
>> still not finished.
>> 
>> For real world applications, we don't do hi32 poisoning, and there isn't much
>> lo32 zext. Benchmarking those bpf programs inside Cilium shows the final
>> zext pass adds about 8% ~ 15% verification time.
>> 
>> The zext pass based on top of "bpf_patch_insn_data" looks more and more is
>> not the best approach to utilize the read32 analysis results.
>> 
>> Previously, in v1 cover letter, I listed some of my other thoughts on how to
>> utilize the liveness analysis results:
>> 
>>    1 Minor change on back-end JIT hook, also pass aux_insn information to
>>      back-ends so they could have per insn information and they could do
>>      zero extension for the marked insn themselves using the most
>>      efficient native insn.
>> 
>>    2 Introduce zero extension insn for eBPF. Then verifier could insert
>>      the new zext insn instead of lshift + rshift. zext could be JITed
>>      more efficiently.
>> 
>>    3 Otherwise JIT back-ends need to do peephole to catch lshift + rshift
>>      and turn them into native zext.
>
> all options sounds like hacks to workaround inefficient bpf_patch_insn_data().
> Especially option 2 will work only because single insn is replaced
> with another insn ?

Option 1 should be a generic solution. It is passing verifier analysis
results generated by insn walk down to JIT back-ends. The information
passed down could be any analysis result useful for JIT code-gen.

> Let's fix the algo of bpf_patch_insn_data() instead, so that 1 insn -> 2+ insn
> is also fast.

The issue with 1 insn -> 2+ insn should be calling of bpf_adj_branches
which is doing another for_each_insn_in_prog traversal, so the zext
insertion becomes something like:

  for_each_insn_in_prog
  ...
     if (zext)
     ...
       for_each_insn_in_prog

which is quadratic. One solution is we chain all branch insns during
previous insn traversal in for example cfg check, and keep the information
somewhere info bpf_prog (env->insn_aux_data is a good place to keep such
information, but insn patch helpers are supposed to work with bpf_prog)
then bpf_adj_branches could traversal this chain instead of iterating
through all insns.

Regards,
Jiong