Re: [PATCH bpf-next 09/10] bpf: Add a helper to issue timestamp cookies in XDP

Yonghong Song <yhs@xxxxxx> · Mon, 8 Nov 2021 23:11:53 -0800

On 11/3/21 7:02 AM, Maxim Mikityanskiy wrote:
On 2021-11-03 04:10, Yonghong Song wrote:


On 11/1/21 4:14 AM, Maxim Mikityanskiy wrote:
On 2021-10-20 19:16, Toke Høiland-Jørgensen wrote:
Lorenz Bauer <lmb@xxxxxxxxxxxxxx> writes:

+bool cookie_init_timestamp_raw(struct tcphdr *th, __be32 *tsval, 
__be32 *tsecr)

I'm probably missing context, Is there something in this function that
means you can't implement it in BPF?

I was about to reply with some other comments but upon closer 
inspection
I ended up at the same conclusion: this helper doesn't seem to be 
needed
at all?

After trying to put this code into BPF (replacing the underlying 
ktime_get_ns with ktime_get_mono_fast_ns), I experienced issues with 
passing the verifier.

In addition to comparing ptr to end, I had to add checks that compare 
ptr to data_end, because the verifier can't deduce that end <= 
data_end. More branches will add a certain slowdown (not measured).

A more serious issue is the overall program complexity. Even though 
the loop over the TCP options has an upper bound, and the pointer 
advances by at least one byte every iteration, I had to limit the 
total number of iterations artificially. The maximum number of 
iterations that makes the verifier happy is 10. With more iterations, 
I have the following error:

BPF program is too large. Processed 1000001 insn

                        processed 1000001 insns (limit 1000000) 
max_states_per_insn 29 total_states 35489 peak_states 596 mark_read 45

I assume that BPF_COMPLEXITY_LIMIT_INSNS (1 million) is the 
accumulated amount of instructions that the verifier can process in 
all branches, is that right? It doesn't look realistic that my 
program can run 1 million instructions in a single run, but it might 
be that if you take all possible flows and add up the instructions 
from these flows, it will exceed 1 million.

The limitation of maximum 10 TCP options might be not enough, given 
that valid packets are permitted to include more than 10 NOPs. An 
alternative of using bpf_load_hdr_opt and calling it three times 
doesn't look good either, because it will be about three times slower 
than going over the options once. So maybe having a helper for that 
is better than trying to fit it into BPF?

One more interesting fact is the time that it takes for the verifier 
to check my program. If it's limited to 10 iterations, it does it 
pretty fast, but if I try to increase the number to 11 iterations, it 
takes several minutes for the verifier to reach 1 million 
instructions and print the error then. I also tried grouping the NOPs 
in an inner loop to count only 10 real options, and the verifier has 
been running for a few hours without any response. Is it normal? 

Maxim, this may expose a verifier bug. Do you have a reproducer I can 
access? I would like to debug this to see what is the root case. Thanks!

Thanks, I appreciate your help in debugging it. The reproducer is based 
on the modified XDP program from patch 10 in this series. You'll need to 
apply at least patches 6, 7, 8 from this series to get new BPF helpers 
needed for the XDP program (tell me if that's a problem, I can try to 
remove usage of new helpers, but it will affect the program length and 
may produce different results in the verifier).

See the C code of the program that passes the verifier (compiled with 
clang version 12.0.0-1ubuntu1) in the bottom of this email. If you 
increase the loop boundary from 10 to at least 11 in 
cookie_init_timestamp_raw(), it fails the verifier after a few minutes. 

I tried to reproduce with latest llvm (llvm-project repo),
loop boundary 10 is okay and 11 exceeds the 1M complexity limit. For 10,
the number of verified instructions is 563626 (more than 0.5M) so it is
totally possible that one more iteration just blows past the limit.


If you apply this tiny change, it fails the verifier after about 3 hours:

--- a/samples/bpf/syncookie_kern.c
+++ b/samples/bpf/syncookie_kern.c
@@ -167,6 +167,7 @@ static __always_inline bool cookie_init_
      for (i = 0; i < 10; i++) {
          u8 opcode, opsize;

+skip_nop:
          if (ptr >= end)
              break;
          if (ptr + 1 > data_end)
@@ -178,7 +179,7 @@ static __always_inline bool cookie_init_
              break;
          if (opcode == TCPOPT_NOP) {
              ++ptr;
-            continue;
+            goto skip_nop;
          }

          if (ptr + 1 >= end)

I tried this as well, with latest llvm, and got the following errors
in ~30 seconds:

......
536: (79) r2 = *(u64 *)(r10 -96)
537: R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) 
R1=pkt(id=9,off=499,r=518,umax_value=60,var_off=(0x0; 0x3c)) 
R2=pkt_end(id=0,off=0,imm=0) 
R3=pkt(id=27,off=14,r=0,umin_value=20,umax_value=120,var_off=(0x0; 
0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R4=invP3 
R5=inv1 R6=ctx(id=0,off=0,imm=0) R7=pkt(id=9,off=519,r=518,umax_va^C
[yhs@devbig309.ftw3 ~/work/bpf-next/samples/bpf] tail -f log
550: (55) if r0 != 0x4 goto pc+4
The sequence of 8193 jumps is too complex.
verification time 30631375 usec
stack depth 160
processed 330595 insns (limit 1000000) max_states_per_insn 4 
total_states 20377 peak_states 100 mark_read 37

With llvm12, I got the similar verification error:

The sequence of 8193 jumps is too complex.
processed 330592 insns (limit 1000000) max_states_per_insn 4 
total_states 20378 peak_states 101 mark_read 37

Could you check again with your experiment which takes 3 hours to
fail? What is the verification failure log?


--cut--

// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
/* Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights 
reserved. */

[...]