On Tue, Nov 30, 2021 at 10:40 PM Yonghong Song <yhs@xxxxxx> wrote: > > > > On 11/29/21 9:51 AM, Maxim Mikityanskiy wrote: > > On 2021-11-26 19:07, Yonghong Song wrote: > >> > >> > >> On 11/26/21 8:50 AM, Maxim Mikityanskiy wrote: > >>> On 2021-11-26 07:43, Yonghong Song wrote: > >>>> > >>>> > >>>> On 11/25/21 6:34 AM, Maxim Mikityanskiy wrote: > >>>>> On 2021-11-09 09:11, Yonghong Song wrote: > >>>>>> > >>>>>> > >>>>>> On 11/3/21 7:02 AM, Maxim Mikityanskiy wrote: > >>>>>>> On 2021-11-03 04:10, Yonghong Song wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> On 11/1/21 4:14 AM, Maxim Mikityanskiy wrote: > >>>>>>>>> On 2021-10-20 19:16, Toke Høiland-Jørgensen wrote: > >>>>>>>>>> Lorenz Bauer <lmb@xxxxxxxxxxxxxx> writes: > >>>>>>>>>> > >>>>>>>>>>>> +bool cookie_init_timestamp_raw(struct tcphdr *th, __be32 > >>>>>>>>>>>> *tsval, __be32 *tsecr) > >>>>>>>>>>> > >>>>>>>>>>> I'm probably missing context, Is there something in this > >>>>>>>>>>> function that > >>>>>>>>>>> means you can't implement it in BPF? > >>>>>>>>>> > >>>>>>>>>> I was about to reply with some other comments but upon closer > >>>>>>>>>> inspection > >>>>>>>>>> I ended up at the same conclusion: this helper doesn't seem to > >>>>>>>>>> be needed > >>>>>>>>>> at all? > >>>>>>>>> > >>>>>>>>> After trying to put this code into BPF (replacing the > >>>>>>>>> underlying ktime_get_ns with ktime_get_mono_fast_ns), I > >>>>>>>>> experienced issues with passing the verifier. > >>>>>>>>> > >>>>>>>>> In addition to comparing ptr to end, I had to add checks that > >>>>>>>>> compare ptr to data_end, because the verifier can't deduce that > >>>>>>>>> end <= data_end. More branches will add a certain slowdown (not > >>>>>>>>> measured). > >>>>>>>>> > >>>>>>>>> A more serious issue is the overall program complexity. Even > >>>>>>>>> though the loop over the TCP options has an upper bound, and > >>>>>>>>> the pointer advances by at least one byte every iteration, I > >>>>>>>>> had to limit the total number of iterations artificially. The > >>>>>>>>> maximum number of iterations that makes the verifier happy is > >>>>>>>>> 10. With more iterations, I have the following error: > >>>>>>>>> > >>>>>>>>> BPF program is too large. Processed 1000001 insn > >>>>>>>>> > >>>>>>>>> processed 1000001 insns (limit 1000000) > >>>>>>>>> max_states_per_insn 29 total_states 35489 peak_states 596 > >>>>>>>>> mark_read 45 > >>>>>>>>> > >>>>>>>>> I assume that BPF_COMPLEXITY_LIMIT_INSNS (1 million) is the > >>>>>>>>> accumulated amount of instructions that the verifier can > >>>>>>>>> process in all branches, is that right? It doesn't look > >>>>>>>>> realistic that my program can run 1 million instructions in a > >>>>>>>>> single run, but it might be that if you take all possible flows > >>>>>>>>> and add up the instructions from these flows, it will exceed 1 > >>>>>>>>> million. > >>>>>>>>> > >>>>>>>>> The limitation of maximum 10 TCP options might be not enough, > >>>>>>>>> given that valid packets are permitted to include more than 10 > >>>>>>>>> NOPs. An alternative of using bpf_load_hdr_opt and calling it > >>>>>>>>> three times doesn't look good either, because it will be about > >>>>>>>>> three times slower than going over the options once. So maybe > >>>>>>>>> having a helper for that is better than trying to fit it into BPF? > >>>>>>>>> > >>>>>>>>> One more interesting fact is the time that it takes for the > >>>>>>>>> verifier to check my program. If it's limited to 10 iterations, > >>>>>>>>> it does it pretty fast, but if I try to increase the number to > >>>>>>>>> 11 iterations, it takes several minutes for the verifier to > >>>>>>>>> reach 1 million instructions and print the error then. I also > >>>>>>>>> tried grouping the NOPs in an inner loop to count only 10 real > >>>>>>>>> options, and the verifier has been running for a few hours > >>>>>>>>> without any response. Is it normal? > >>>>>>>> > >>>>>>>> Maxim, this may expose a verifier bug. Do you have a reproducer > >>>>>>>> I can access? I would like to debug this to see what is the root > >>>>>>>> case. Thanks! > >>>>>>> > >>>>>>> Thanks, I appreciate your help in debugging it. The reproducer is > >>>>>>> based on the modified XDP program from patch 10 in this series. > >>>>>>> You'll need to apply at least patches 6, 7, 8 from this series to > >>>>>>> get new BPF helpers needed for the XDP program (tell me if that's > >>>>>>> a problem, I can try to remove usage of new helpers, but it will > >>>>>>> affect the program length and may produce different results in > >>>>>>> the verifier). > >>>>>>> > >>>>>>> See the C code of the program that passes the verifier (compiled > >>>>>>> with clang version 12.0.0-1ubuntu1) in the bottom of this email. > >>>>>>> If you increase the loop boundary from 10 to at least 11 in > >>>>>>> cookie_init_timestamp_raw(), it fails the verifier after a few > >>>>>>> minutes. > >>>>>> > >>>>>> I tried to reproduce with latest llvm (llvm-project repo), > >>>>>> loop boundary 10 is okay and 11 exceeds the 1M complexity limit. > >>>>>> For 10, > >>>>>> the number of verified instructions is 563626 (more than 0.5M) so > >>>>>> it is > >>>>>> totally possible that one more iteration just blows past the limit. > >>>>> > >>>>> So, does it mean that the verifying complexity grows exponentially > >>>>> with increasing the number of loop iterations (options parsed)? > >>>> > >>>> Depending on verification time pruning results, it is possible > >>>> slightly increase number of branches could result quite some (2x, > >>>> 4x, etc.) of > >>>> to-be-verified dynamic instructions. > >>> > >>> Is it at least theoretically possible to make this coefficient below > >>> 2x? I.e. write a loop, so that adding another iteration will not > >>> double the number of verified instructions, but will have a smaller > >>> increase? > >>> > >>> If that's not possible, then it looks like BPF can't have loops > >>> bigger than ~19 iterations (2^20 > 1M), and this function is not > >>> implementable in BPF. > >> > >> This is the worst case. As I mentioned pruning plays a huge role in > >> verification. Effective pruning can add little increase of dynamic > >> instructions say from 19 iterations to 20 iterations. But we have > >> to look at verifier log to find out whether pruning is less effective or > >> something else... Based on my experience, in most cases, pruning is > >> quite effective. But occasionally it is not... You can look at > >> verifier.c file to roughly understand how pruning work. > >> > >> Not sure whether in this case it is due to less effective pruning or > >> inherently we just have to go through all these dynamic instructions > >> for verification. > >> > >>> > >>>>> > >>>>> Is it a good enough reason to keep this code as a BPF helper, > >>>>> rather than trying to fit it into the BPF program? > >>>> > >>>> Another option is to use global function, which is verified separately > >>>> from the main bpf program. > >>> > >>> Simply removing __always_inline didn't change anything. Do I need to > >>> make any other changes? Will it make sense to call a global function > >>> in a loop, i.e. will it increase chances to pass the verifier? > >> > >> global function cannot be static function. You can try > >> either global function inside the loop or global function > >> containing the loop. It probably more effective to put loops > >> inside the global function. You have to do some experiments > >> to see which one is better. > > > > Sorry for a probably noob question, but how can I pass data_end to a > > global function? I'm getting this error: > > > > Validating cookie_init_timestamp_raw() func#1... > > arg#4 reference type('UNKNOWN ') size cannot be determined: -22 > > processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 > > peak_states 0 mark_read 0 > > > > When I removed data_end, I got another one: > > > > ; opcode = ptr[0]; > > 969: (71) r8 = *(u8 *)(r0 +0) > > R0=mem(id=0,ref_obj_id=0,off=20,imm=0) > > R1=mem(id=0,ref_obj_id=0,off=0,umin_value=4,umax_value=60,var_off=(0x0; > > 0x3f),s32_min_value=0,s32_max_value=63,u32_max_value=63) > > R2=invP0 R3=invP0 R4=mem_or_null(id=6,ref_obj_id=0,off=0,imm=0) > > R5=invP0 R6=mem_or_null(id=5,ref_obj_id=0,off=0,imm=0) > > R7=mem(id=0,ref_obj_id=0,off=0,imm=0) R10=fp0 fp > > -8=00000000 fp-16=invP15 > > invalid access to memory, mem_size=20 off=20 size=1 > > R0 min value is outside of the allowed memory range > > processed 20 insns (limit 1000000) max_states_per_insn 0 total_states 2 > > peak_states 2 mark_read 1 > > > > It looks like pointers to the context aren't supported: > > > > https://www.spinics.net/lists/bpf/msg34907.html > > > > > test_global_func11 - check that CTX pointer cannot be passed > > > > What is the standard way to pass packet data to a global function? > > Since global function is separately verified, you need to pass the 'ctx' > to the global function and do the 'data_end' check again in the global > function. This will incur some packet re-parsing overhead similar to > tail calls. Now that the bpf_loop() helper landed, it's another option for doing repeated work. Please see [0]. [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=587497&state=* > > > > > Thanks, > > Max > > > >>> > >>>>> > >>>>>> > >>>>>>> If you apply this tiny change, it fails the verifier after about > >>>>>>> 3 hours: > >>>>>>> > >>>> [...] > >>> > >