Re: [PATCH] bpf,x64: pad NOPs to make images converge more easily

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/14/20 9:15 AM, Gary Lin wrote:
On Mon, Dec 14, 2020 at 11:56:22AM +0800, Gary Lin wrote:
On Fri, Dec 11, 2020 at 09:05:05PM +0100, Daniel Borkmann wrote:
On 12/11/20 9:19 AM, Gary Lin wrote:
The x64 bpf jit expects bpf images converge within the given passes, but
it could fail to do so with some corner cases. For example:

        l0:     ldh [4]
        l1:     jeq #0x537d, l2, l40
        l2:     ld [0]
        l3:     jeq #0xfa163e0d, l4, l40
        l4:     ldh [12]
        l5:     ldx #0xe
        l6:     jeq #0x86dd, l41, l7
        l8:     ld [x+16]
        l9:     ja 41

          [... repeated ja 41 ]

        l40:    ja 41
        l41:    ret #0
        l42:    ld #len
        l43:    ret a

This bpf program contains 32 "ja 41" instructions which are effectively
NOPs and designed to be replaced with valid code dynamically. Ideally,
bpf jit should optimize those "ja 41" instructions out when translating
the bpf instructions into x86_64 machine code. However, do_jit() can
only remove one "ja 41" for offset==0 on each pass, so it requires at
least 32 runs to eliminate those JMPs and exceeds the current limit of
passes (20). In the end, the program got rejected when BPF_JIT_ALWAYS_ON
is set even though it's legit as a classic socket filter.

To make the image more likely converge within 20 passes, this commit
pads some instructions with NOPs in the last 5 passes:

1. conditional jumps
    A possible size variance comes from the adoption of imm8 JMP. If the
    offset is imm8, we calculate the size difference of this BPF instruction
    between the previous pass and the current pass and fill the gap with NOPs.
    To avoid the recalculation of jump offset, those NOPs are inserted before
    the JMP code, so we have to subtract the 2 bytes of imm8 JMP when
    calculating the NOP number.

2. BPF_JA
    There are two conditions for BPF_JA.
    a.) nop jumps
      If this instruction is not optimized out in the previous pass,
      instead of removing it, we insert the equivalent size of NOPs.
    b.) label jumps
      Similar to condition jumps, we prepend NOPs right before the JMP
      code.

To make the code concise, emit_nops() is modified to use the signed len and
return the number of inserted NOPs.

To support bpf-to-bpf, a new flag, padded, is introduced to 'struct bpf_prog'
so that bpf_int_jit_compile() could know if the program is padded or not.

Please also add multiple hand-crafted test cases e.g. for bpf-to-bpf calls into
test_verifier (which is part of bpf kselftests) that would exercise this corner
case in x86 jit where we would start to nop pad so that there is proper coverage,
too.

The corner case I had in the commit description is likely being rejected by
the verifier because most of those "ja 41" are unreachable instructions.
Is there any known test case that needs more than 15 passes in x86 jit?

Just an idea. Besides the mentioned corner case, how about making
PADDING_PASSES dynamically configurable (sysfs?) and reusing the existing
test cases? So that we can have a script to set PADDING_PASSES from 1 to 20
and run the bpf selftests separately. This guarantees that the padding
strategy will be applied at least in a certain PADDING_PASSES settings.

I think exposing such implementation detail to users is not that great as they
normally should not need to worry about these things (plus it's also rarely hit
in practice when developing against llvm). On top of all that, such knob would
have no meaning in case of other JITs since most other non-x86 ones have a fixed
number of passes. I think it's probably useful for local testing of the fix, but
less suitable for exposing as sysctl 'uapi' upstream. Re crafting a test case for
bpf-2-bpf calls, you could orientate on bpf_fill_maxinsns10() in lib/test_bpf.c
which is also triggering a high number of passes, port it over to test_verifier
from selftests and experiment from there to integrate calls.

Thanks,
Daniel



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux