On 6/12/23 07:03, Maciej Fijalkowski wrote: > On Wed, Oct 11, 2023 at 11:27:23PM +0800, Leon Hwang wrote: >> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall >> handling in JIT"), the tailcall on x64 works better than before. >> >> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms >> for x64 JIT"), tailcall is able to run in BPF subprograms on x64. >> >> How about: >> >> 1. More than 1 subprograms are called in a bpf program. >> 2. The tailcalls in the subprograms call the bpf program. >> >> Because of missing tail_call_cnt back-propagation, a tailcall hierarchy >> comes up. And MAX_TAIL_CALL_CNT limit does not work for this case. >> >> As we know, in tail call context, the tail_call_cnt propagates by stack >> and rax register between BPF subprograms. So, propagating tail_call_cnt >> pointer by stack and rax register makes tail_call_cnt as like a global >> variable, in order to make MAX_TAIL_CALL_CNT limit works for tailcall >> hierarchy cases. >> >> Before jumping to other bpf prog, load tail_call_cnt from the pointer >> and then compare with MAX_TAIL_CALL_CNT. Finally, increment >> tail_call_cnt by its pointer. >> >> But, where does tail_call_cnt store? >> >> It stores on the stack of bpf prog's caller, like >> >> | STACK | >> | | >> | rip | >> +->| tcc | >> | | rip | >> | | rbp | >> | +---------+ RBP >> | | | >> | | | >> | | | >> +--| tcc_ptr | >> | rbx | >> +---------+ RSP >> >> And tcc_ptr is unnecessary to be popped from stack at the epilogue of bpf >> prog, like the way of commit d207929d97ea028f ("bpf, x64: Drop "pop %rcx" >> instruction on BPF JIT epilogue"). >> >> Why not back-propagate tail_call_cnt? >> >> It's because it's vulnerable to back-propagate it. It's unable to work >> well with the following case. >> >> int prog1(); >> int prog2(); >> >> prog1 is tail caller, and prog2 is tail callee. If we do back-propagate >> tail_call_cnt at the epilogue of prog2, can prog2 run standalone at the >> same time? The answer is NO. Otherwise, there will be a register to be >> polluted, which will make kernel crash. > > Sorry but I keep on reading this explanation and I'm lost what is being > fixed here> > You want to limit the total amount of tail calls that entry prog can do to > MAX_TAIL_CALL_CNT. Although I was working on that, my knowledge here is > rusty, therefore my view might be distorted :) to me MAX_TAIL_CALL_CNT is > to protect us from overflowing kernel stack and endless loops. As long a > single call chain doesn't go over 8kB program is fine. Verifier has a > limit of 256 subprogs from what I see. I try to resolve the following-like cases here. +--------- tailcall --+ | | V --> subprog1 -+ entry bpf prog < A --> subprog2 -+ | | +--------- tailcall --+ Without this fixing, the CPU will be stalled because of too many tailcalls. > > Can you elaborate a bit more about the kernel crash you mention in the > last paragraph? We have progs, prog1, prog2, prog3 and prog4, and the scenario: case1: kprobe1 --> prog1 --> subprog1 --tailcall-> prog2 --> subprog2 --tailcall-> prog3 case2: kprobe2 --> prog2 --> subprog2 --tailcall-> prog3 case3: kprobe3 --> prog4 --> subprog3 --tailcall-> prog3 --> subprog4 --tailcall-> prog4 How does prog2 back-propagate tail_call_cnt to prog1? Possible way 1: When prog2 and prog3 are added to PROG_ARRAY, poke their epilogues to back-propagate tail_call_cnt by RCX register. It seems OK because kprobes do not handle the value in RCX register, like case2. Possible way 2: Can back-propagate tail_call_cnt with RCX register by checking tail_call_cnt != 0 at epilogue when current prog has tailcall? No. As for case1, prog2 handles the value in RCX register, which is not tail_call_cnt, because prog3 has no tailcall and won't populate RCX register with tail_call_cnt. However, I don't like the back-propagating way. Then, I "burn" my brain to figure out pointer propagating ways. RFC PATCH v1 way: Propagate tail_call_cnt and its pointer together. Then, the pointer is used to check MAX_TAIL_CALL_CNT and increment tail_call_cnt. | STACK | +---------+ RBP | | | | | | +--| tcc_ptr | +->| tcc | | rbx | +---------+ RSP RFC PATCH v2 way (current patchset): Propagate tail_call_cnt pointer only. Then, the pointer is used to check MAX_TAIL_CALL_CNT and increment tail_call_cnt. | STACK | | | | rip | +->| tcc | | | rip | | | rbp | | +---------+ RBP | | | | | | | | | +--| tcc_ptr | | rbx | +---------+ RSP > > I also realized that verifier assumes MAX_TAIL_CALL_CNT as 32 which has > changed in the meantime to 33 and we should adjust the max allowed stack > depth of subprogs? I believe this was brought up at LPC? There's following code snippet in verifier.c: /* protect against potential stack overflow that might happen when * bpf2bpf calls get combined with tailcalls. Limit the caller's stack * depth for such case down to 256 so that the worst case scenario * would result in 8k stack size (32 which is tailcall limit * 256 = * 8k). But, MAX_TAIL_CALL_CNT is 33. This was not brought up at LPC 2022&2023. I don't know whether this was brought up at previous LPCs. Thanks, Leon