On 6/9/23 08:57, Ilya Leoshkevich wrote: > On Mon, 2023-09-04 at 23:15 +0800, Leon Hwang wrote: >> >> >> On 2023/9/4 21:10, Ilya Leoshkevich wrote: >>> On Sun, 2023-09-03 at 23:14 +0800, Leon Hwang wrote: >>>> This patch series fixes a tailcall infinite loop on x64. >>>> >>>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and >>>> tailcall >>>> handling in JIT"), the tailcall on x64 works better than before. >>>> >>>> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF >>>> subprograms >>>> for x64 JIT"), tailcall is able to run in BPF subprograms on x64. >>>> >>>> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF >>>> program >>>> to other BPF programs"), BPF program is able to trace other BPF >>>> programs. >>>> >>>> How about combining them all together? >>>> >>>> 1. FENTRY/FEXIT on a BPF subprogram. >>>> 2. A tailcall runs in the BPF subprogram. >>>> 3. The tailcall calls the subprogram's caller. >>>> >>>> As a result, a tailcall infinite loop comes up. And the loop >>>> would >>>> halt >>>> the machine. >>>> >>>> As we know, in tail call context, the tail_call_cnt propagates by >>>> stack >>>> and rax register between BPF subprograms. So do in trampolines. >>>> >>>> How did I discover the bug? >>>> >>>> From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth >>>> 256 >>>> for >>>> subprogs with tailcalls"), the total stack size limits to around >>>> 8KiB. >>>> Then, I write some bpf progs to validate the stack consuming, >>>> that >>>> are >>>> tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on >>>> bpf2bpf[1]. >>>> >>>> At that time, accidently, I made a tailcall loop. And then the >>>> loop >>>> halted >>>> my VM. Without the loop, the bpf progs would consume over 8KiB >>>> stack >>>> size. >>>> But the _stack-overflow_ did not halt my VM. >>>> >>>> With bpf_printk(), I confirmed that the tailcall count limit did >>>> not >>>> work >>>> expectedly. Next, read the code and fix it. >>>> >>>> Finally, unfortunately, I only fix it on x64 but other arches. As >>>> a >>>> result, CI tests failed because this bug hasn't been fixed on >>>> s390x. >>>> >>>> Some helps on s390x are requested. >>> >>> I will take a look, thanks for letting me know. >> >> Great. >> >>> I noticed there was something peculiar in this area when >>> implementing >>> the trampoline: >>> >>> * Note 1: The callee can increment the tail call counter, >>> but >>> * we do not load it back, since the x86 JIT does not do >>> this >>> * either.> >>> but I thought that this was intentional. >> >> I do think so. >> >> But wait, should we load it back? >> >> Let me show a demo: >> >> struct { >> __uint(type, BPF_MAP_TYPE_PROG_ARRAY); >> __uint(max_entries, 4); >> __uint(key_size, sizeof(__u32)); >> __uint(value_size, sizeof(__u32)); >> } jmp_table SEC(".maps"); >> >> static __noinline >> int subprog_tail_01(struct __sk_buff *skb) >> { >> if (load_byte(skb, 0)) >> bpf_tail_call_static(skb, &jmp_table, 1); >> else >> bpf_tail_call_static(skb, &jmp_table, 0); >> return 1; >> } >> >> static __noinline >> int subprog_tail_23(struct __sk_buff *skb) >> { >> if (load_byte(skb, 0)) >> bpf_tail_call_static(skb, &jmp_table, 3); >> else >> bpf_tail_call_static(skb, &jmp_table, 2); >> return 1; >> } >> >> int count0 = 0; >> >> SEC("tc") >> int classifier_01(struct __sk_buff *skb) >> { >> count0++; >> return subprog_tail_01(skb); >> } >> >> int count1 = 0; >> >> SEC("tc") >> int classifier_23(struct __sk_buff *skb) >> { >> count1++; >> return subprog_tail_23(skb); >> } >> >> static __noinline >> int subprog_tailcall(struct __sk_buff *skb, int index) >> { >> volatile int retval = 0; >> bpf_tail_call(skb, &jmp_table, index); >> return retval; >> } >> >> SEC("tc") >> int entry(struct __sk_buff *skb) >> { >> subprog_tailcall(skb, 0); >> subprog_tailcall(skb, 2); >> >> return 0; >> } >> >> Finally, count0 is 33. And count1 is 33, too. It breaks the >> MAX_TAIL_CALL_CNT limit by the way tailcall in subprog. > > Thanks for coming up with an example, I could not create something like > this back then and just left a note for the future. > >> From 04fd61ab36ec065e ("bpf: allow bpf programs to tail-call other >> bpf >> programs"): >> The chain of tail calls can form unpredictable dynamic loops >> therefore >> tail_call_cnt is used to limit the number of calls and currently is >> set >> to 32. >> >> It seems like that MAX_TAIL_CALL_CNT limits the max tailcall >> hierarchy >> layers. >> >> So, should we load it back? > > I wonder if the current behavior can lead to high system load in some > cases. The current limit on the number of instructions is 1M; with tail > calls it goes up to MAX_TAIL_CALL_CNT * 1M. If one uses the technique > above to the fullest extent, i.e., on each tail call hierarchy level, > can't one execute (2 ** MAX_TAIL_CALL_CNT) * 1M instructions as a > result? Executing (2 ** MAX_TAIL_CALL_CNT) * 1M instructions is really harmful. So, we should load it back. I'll try to fix it with another patchset. Thanks, Leon