Re: [PATCH RFC] bpf, x64: allow not-converged images when BPF_JIT_ALWAYS_ON is set

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 13 Nov 2020 17:48:31 -0800

On Fri, Nov 13, 2020 at 12:40 AM Gary Lin <glin@xxxxxxxx> wrote:
>
> The x64 bpf jit expects the bpf images converge within the given passes.
> However there is a corner case:
>
>   l0:     ldh [4]
>   l1:     jeq #0x537d, l2, l40
>   l2:     ld [0]
>   l3:     jeq #0xfa163e0d, l4, l40
>   l4:     ldh [12]
>   l5:     ldx #0xe
>   l6:     jeq #0x86dd, l41, l7
>   l7:     jeq #0x800, l8, l41
>   l8:     ld [x+16]
>   l9:     ja 41
>
>     [... repeated ja 41 ]
>
>   l40:    ja 41
>   l41:    ret #0
>   l42:    ld #len
>   l43:    ret a
>
> The bpf program contains 32 "ja 41" and do_jit() only removes one "ja 41"
> right before "l41:  ret #0" for offset==0 in each pass, so
> bpf_int_jit_compile() needs to run do_jit() at least 32 times to
> eliminate those JMP instructions. Since the current max number of passes
> is 20, the bpf program couldn't converge within 20 passes and got rejected
> when BPF_JIT_ALWAYS_ON is set even though it's legit as a classic socket
> filter.
>
> A not-converged image may be not optimal but at least the bpf
> instructions are translated into x64 machine code. Maybe we could just
> issue a warning instead so that the program is still loaded and the user
> is also notified.

Non-convergence is not about being optimal. It's about correctness.
If size is different it likely means that at least one jump has the
wrong offset.

Bumping from 20 to 64 also won't solve it.
There could be a case where 64 isn't enough either.
One of the test_bpf.ko tests can hit any limit, iirc.

Also we've seen a case where JIT might never converge.
The iteration N can have size 40, iteration N+1 size 38, iteration N+2 size 40
and keep oscillating like this.

I think the fix could be is to avoid optimality in size when pass
number is getting large.
Like after pass > 10 BPF_JA could always use 32-bit offset regardless
of actual addrs[i + insn->off] - addrs[i]; difference.
There could be other solutions too.