On Mon, 2023-09-04 at 23:15 +0800, Leon Hwang wrote: > > > On 2023/9/4 21:10, Ilya Leoshkevich wrote: > > On Sun, 2023-09-03 at 23:14 +0800, Leon Hwang wrote: > > > This patch series fixes a tailcall infinite loop on x64. > > > > > > From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and > > > tailcall > > > handling in JIT"), the tailcall on x64 works better than before. > > > > > > From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF > > > subprograms > > > for x64 JIT"), tailcall is able to run in BPF subprograms on x64. > > > > > > From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF > > > program > > > to other BPF programs"), BPF program is able to trace other BPF > > > programs. > > > > > > How about combining them all together? > > > > > > 1. FENTRY/FEXIT on a BPF subprogram. > > > 2. A tailcall runs in the BPF subprogram. > > > 3. The tailcall calls the subprogram's caller. > > > > > > As a result, a tailcall infinite loop comes up. And the loop > > > would > > > halt > > > the machine. > > > > > > As we know, in tail call context, the tail_call_cnt propagates by > > > stack > > > and rax register between BPF subprograms. So do in trampolines. > > > > > > How did I discover the bug? > > > > > > From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth > > > 256 > > > for > > > subprogs with tailcalls"), the total stack size limits to around > > > 8KiB. > > > Then, I write some bpf progs to validate the stack consuming, > > > that > > > are > > > tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on > > > bpf2bpf[1]. > > > > > > At that time, accidently, I made a tailcall loop. And then the > > > loop > > > halted > > > my VM. Without the loop, the bpf progs would consume over 8KiB > > > stack > > > size. > > > But the _stack-overflow_ did not halt my VM. > > > > > > With bpf_printk(), I confirmed that the tailcall count limit did > > > not > > > work > > > expectedly. Next, read the code and fix it. > > > > > > Finally, unfortunately, I only fix it on x64 but other arches. As > > > a > > > result, CI tests failed because this bug hasn't been fixed on > > > s390x. > > > > > > Some helps on s390x are requested. > > > > I will take a look, thanks for letting me know. > > Great. > > > I noticed there was something peculiar in this area when > > implementing > > the trampoline: > > > > * Note 1: The callee can increment the tail call counter, > > but > > * we do not load it back, since the x86 JIT does not do > > this > > * either.> > > but I thought that this was intentional. > > I do think so. > > But wait, should we load it back? > > Let me show a demo: > > struct { > __uint(type, BPF_MAP_TYPE_PROG_ARRAY); > __uint(max_entries, 4); > __uint(key_size, sizeof(__u32)); > __uint(value_size, sizeof(__u32)); > } jmp_table SEC(".maps"); > > static __noinline > int subprog_tail_01(struct __sk_buff *skb) > { > if (load_byte(skb, 0)) > bpf_tail_call_static(skb, &jmp_table, 1); > else > bpf_tail_call_static(skb, &jmp_table, 0); > return 1; > } > > static __noinline > int subprog_tail_23(struct __sk_buff *skb) > { > if (load_byte(skb, 0)) > bpf_tail_call_static(skb, &jmp_table, 3); > else > bpf_tail_call_static(skb, &jmp_table, 2); > return 1; > } > > int count0 = 0; > > SEC("tc") > int classifier_01(struct __sk_buff *skb) > { > count0++; > return subprog_tail_01(skb); > } > > int count1 = 0; > > SEC("tc") > int classifier_23(struct __sk_buff *skb) > { > count1++; > return subprog_tail_23(skb); > } > > static __noinline > int subprog_tailcall(struct __sk_buff *skb, int index) > { > volatile int retval = 0; > bpf_tail_call(skb, &jmp_table, index); > return retval; > } > > SEC("tc") > int entry(struct __sk_buff *skb) > { > subprog_tailcall(skb, 0); > subprog_tailcall(skb, 2); > > return 0; > } > > Finally, count0 is 33. And count1 is 33, too. It breaks the > MAX_TAIL_CALL_CNT limit by the way tailcall in subprog. Thanks for coming up with an example, I could not create something like this back then and just left a note for the future. > From 04fd61ab36ec065e ("bpf: allow bpf programs to tail-call other > bpf > programs"): > The chain of tail calls can form unpredictable dynamic loops > therefore > tail_call_cnt is used to limit the number of calls and currently is > set > to 32. > > It seems like that MAX_TAIL_CALL_CNT limits the max tailcall > hierarchy > layers. > > So, should we load it back? I wonder if the current behavior can lead to high system load in some cases. The current limit on the number of instructions is 1M; with tail calls it goes up to MAX_TAIL_CALL_CNT * 1M. If one uses the technique above to the fullest extent, i.e., on each tail call hierarchy level, can't one execute (2 ** MAX_TAIL_CALL_CNT) * 1M instructions as a result? > Thanks, > Leon