On Mon, Dec 14, 2020 at 04:31:44PM +0100, Daniel Borkmann wrote: > On 12/14/20 9:15 AM, Gary Lin wrote: > > On Mon, Dec 14, 2020 at 11:56:22AM +0800, Gary Lin wrote: > > > On Fri, Dec 11, 2020 at 09:05:05PM +0100, Daniel Borkmann wrote: > > > > On 12/11/20 9:19 AM, Gary Lin wrote: > > > > > The x64 bpf jit expects bpf images converge within the given passes, but > > > > > it could fail to do so with some corner cases. For example: > > > > > > > > > > l0: ldh [4] > > > > > l1: jeq #0x537d, l2, l40 > > > > > l2: ld [0] > > > > > l3: jeq #0xfa163e0d, l4, l40 > > > > > l4: ldh [12] > > > > > l5: ldx #0xe > > > > > l6: jeq #0x86dd, l41, l7 > > > > > l8: ld [x+16] > > > > > l9: ja 41 > > > > > > > > > > [... repeated ja 41 ] > > > > > > > > > > l40: ja 41 > > > > > l41: ret #0 > > > > > l42: ld #len > > > > > l43: ret a > > > > > > > > > > This bpf program contains 32 "ja 41" instructions which are effectively > > > > > NOPs and designed to be replaced with valid code dynamically. Ideally, > > > > > bpf jit should optimize those "ja 41" instructions out when translating > > > > > the bpf instructions into x86_64 machine code. However, do_jit() can > > > > > only remove one "ja 41" for offset==0 on each pass, so it requires at > > > > > least 32 runs to eliminate those JMPs and exceeds the current limit of > > > > > passes (20). In the end, the program got rejected when BPF_JIT_ALWAYS_ON > > > > > is set even though it's legit as a classic socket filter. > > > > > > > > > > To make the image more likely converge within 20 passes, this commit > > > > > pads some instructions with NOPs in the last 5 passes: > > > > > > > > > > 1. conditional jumps > > > > > A possible size variance comes from the adoption of imm8 JMP. If the > > > > > offset is imm8, we calculate the size difference of this BPF instruction > > > > > between the previous pass and the current pass and fill the gap with NOPs. > > > > > To avoid the recalculation of jump offset, those NOPs are inserted before > > > > > the JMP code, so we have to subtract the 2 bytes of imm8 JMP when > > > > > calculating the NOP number. > > > > > > > > > > 2. BPF_JA > > > > > There are two conditions for BPF_JA. > > > > > a.) nop jumps > > > > > If this instruction is not optimized out in the previous pass, > > > > > instead of removing it, we insert the equivalent size of NOPs. > > > > > b.) label jumps > > > > > Similar to condition jumps, we prepend NOPs right before the JMP > > > > > code. > > > > > > > > > > To make the code concise, emit_nops() is modified to use the signed len and > > > > > return the number of inserted NOPs. > > > > > > > > > > To support bpf-to-bpf, a new flag, padded, is introduced to 'struct bpf_prog' > > > > > so that bpf_int_jit_compile() could know if the program is padded or not. > > > > > > > > Please also add multiple hand-crafted test cases e.g. for bpf-to-bpf calls into > > > > test_verifier (which is part of bpf kselftests) that would exercise this corner > > > > case in x86 jit where we would start to nop pad so that there is proper coverage, > > > > too. > > > > > > > The corner case I had in the commit description is likely being rejected by > > > the verifier because most of those "ja 41" are unreachable instructions. > > > Is there any known test case that needs more than 15 passes in x86 jit? > > > > > Just an idea. Besides the mentioned corner case, how about making > > PADDING_PASSES dynamically configurable (sysfs?) and reusing the existing > > test cases? So that we can have a script to set PADDING_PASSES from 1 to 20 > > and run the bpf selftests separately. This guarantees that the padding > > strategy will be applied at least in a certain PADDING_PASSES settings. > > I think exposing such implementation detail to users is not that great as they > normally should not need to worry about these things (plus it's also rarely hit > in practice when developing against llvm). On top of all that, such knob would > have no meaning in case of other JITs since most other non-x86 ones have a fixed > number of passes. I think it's probably useful for local testing of the fix, but > less suitable for exposing as sysctl 'uapi' upstream. Re crafting a test case for > bpf-2-bpf calls, you could orientate on bpf_fill_maxinsns10() in lib/test_bpf.c > which is also triggering a high number of passes, port it over to test_verifier > from selftests and experiment from there to integrate calls. > Thanks for the hint. Will try bpf_fill_maxinsns10(). Gary Lin