Re: [PATCH RFC] bpf, x64: allow not-converged images when BPF_JIT_ALWAYS_ON is set

Gary Lin <glin@xxxxxxxx> · Mon, 16 Nov 2020 14:31:50 +0800



On Fri, Nov 13, 2020 at 05:48:31PM -0800, Alexei Starovoitov wrote:
> On Fri, Nov 13, 2020 at 12:40 AM Gary Lin <glin@xxxxxxxx> wrote:
> >
> > The x64 bpf jit expects the bpf images converge within the given passes.
> > However there is a corner case:
> >
> >   l0:     ldh [4]
> >   l1:     jeq #0x537d, l2, l40
> >   l2:     ld [0]
> >   l3:     jeq #0xfa163e0d, l4, l40
> >   l4:     ldh [12]
> >   l5:     ldx #0xe
> >   l6:     jeq #0x86dd, l41, l7
> >   l7:     jeq #0x800, l8, l41
> >   l8:     ld [x+16]
> >   l9:     ja 41
> >
> >     [... repeated ja 41 ]
> >
> >   l40:    ja 41
> >   l41:    ret #0
> >   l42:    ld #len
> >   l43:    ret a
> >
> > The bpf program contains 32 "ja 41" and do_jit() only removes one "ja 41"
> > right before "l41:  ret #0" for offset==0 in each pass, so
> > bpf_int_jit_compile() needs to run do_jit() at least 32 times to
> > eliminate those JMP instructions. Since the current max number of passes
> > is 20, the bpf program couldn't converge within 20 passes and got rejected
> > when BPF_JIT_ALWAYS_ON is set even though it's legit as a classic socket
> > filter.
> >
> > A not-converged image may be not optimal but at least the bpf
> > instructions are translated into x64 machine code. Maybe we could just
> > issue a warning instead so that the program is still loaded and the user
> > is also notified.
> 
> Non-convergence is not about being optimal. It's about correctness.
> If size is different it likely means that at least one jump has the
> wrong offset.
> 
Ah, I see.

> Bumping from 20 to 64 also won't solve it.
> There could be a case where 64 isn't enough either.
True. Increasing the number of passes is just a workaround.

> One of the test_bpf.ko tests can hit any limit, iirc.
Thanks for the pointer. Will look into the tests.

>
> Also we've seen a case where JIT might never converge.
> The iteration N can have size 40, iteration N+1 size 38, iteration N+2 size 40
> and keep oscillating like this.
> 
> I think the fix could be is to avoid optimality in size when pass
> number is getting large.
> Like after pass > 10 BPF_JA could always use 32-bit offset regardless
> of actual addrs[i + insn->off] - addrs[i]; difference.
> There could be other solutions too.
> 
So the size convergence can be ignored if all BPF_JMPs are translated
into 32-bit offset jmp?

Thanks,

Gary Lin