Re: test_kmod.sh fails with constant blinding

Eduard Zingerman <eddyz87@xxxxxxxxx> · Wed, 03 Jan 2024 00:39:21 +0200

On Tue, 2024-01-02 at 11:41 -0800, Yonghong Song wrote:
> On 1/2/24 9:47 AM, Eduard Zingerman wrote:
> > On Tue, 2024-01-02 at 08:56 -0800, Yonghong Song wrote:
> > > On 1/2/24 7:11 AM, Bram Schuur wrote:
> > > > Me and my colleague Jan-Gerd Tenberge encountered this issue in production on the 5.15, 6.1 and 6.2 kernel versions. We make a small reproducible case that might help find the root cause:
> > > > 
> > > > simple_repo.c:
> > > > 
> > > > #include <linux/bpf.h>
> > > > #include <bpf/bpf_helpers.h>
> > > > 
> > > > SEC("socket")
> > > > int socket__http_filter(struct __sk_buff* skb) {
> > > >     volatile __u32 r = bpf_get_prandom_u32();
> > > >     if (r == 0) {
> > > >       goto done;
> > > >     }
> > > > 
> > > > 
> > > > #pragma clang loop unroll(full)
> > > >     for (int i = 0; i < 12000; i++) {
> > > >       r += 1;
> > > >     }
> > > > 
> > > > #pragma clang loop unroll(full)
> > > >     for (int i = 0; i < 12000; i++) {
> > > >       r += 1;
> > > >     }
> > > > done:
> > > >     return r;
> > > > }
> > > > 
> > > > Looking at kernel/bpf/core.c it seems that during constant blinding every instruction which has an constant operand gets 2 additional instructions. This increases the amount of instructions between the JMP and target of the JMP cause rewrite of the JMP to fail because the offset becomes bigger than S16_MAX.
> > > This is indeed possible as verifier might increase insn account in various cases.
> > > -mcpu=v4 is designed to solve this problem but it is only available at 6.6 and above.
> > There might be situations when -mcpu=v4 won't help, as currently llvm
> > would generate long jumps only when it knows at compile time that jump
> > is indeed long. However here constant blinding would probably triple
> > the size of the loop body, so for llvm this jump won't be long.
> > 
> > If we consider this corner case an issue, it might be possible to fix
> 
> This definitely a corner case. But full unroll is not what we recommended although
> we do try to accommodate it with cpuv4.
> 
> > it by teaching bpf_jit_blind_constants() to insert 'BPF_JMP32 | BPF_JA'
> > when jump targets cross the 2**16 thresholds.
> > Wdyt?
> 
> If we indeed hit an issue with cpuv4, I prefer to fix in llvm side.
> Currently, gotol is generated if offset is >= S16_MAX/2 or <= S16_MIN/2.
> We could make range further smaller or all gotol since there are quite
> some architectures supporting gotol now (x86, arm, riscv, ppc, etc.).
> 

I tried building this program as v3 and as v4 using the following
command line:

  clang -O2 --target=bpf -c t.c -mcpu=<v3 or v4> -o t.o

(I copied definitions of SEC and bpf_get_prandom_u32 from bpf_helper_defs.h).

With the following results:
- when built as v4 program can be compiled, gotol is generated and
  program can be loaded even when bpf_jit_harded is set:
  "echo 2 > /proc/sys/net/core/bpf_jit_harden"
  (as far as I understand this is sufficient to request constant blinding);
- when built as v3 clang exits with error message (both distro clang-16 and
  my local build for clang-18):
  "fatal error: error in backend: Branch target out of insn range"
  so I'm curious which flags were used by Bram.
- Also, program cannot be compiled when -g is specified:
  on my machine with 32G of RAM clang consumes all available RAM
  (w/o -g "only" 155Mb of RAM are used).