Re: [RFC PATCH bpf-next v4 0/4] bpf, x64: Fix tailcall infinite loop

Ilya Leoshkevich <iii@xxxxxxxxxxxxx> · Wed, 06 Sep 2023 02:57:17 +0200

On Mon, 2023-09-04 at 23:15 +0800, Leon Hwang wrote:
> 
> 
> On 2023/9/4 21:10, Ilya Leoshkevich wrote:
> > On Sun, 2023-09-03 at 23:14 +0800, Leon Hwang wrote:
> > > This patch series fixes a tailcall infinite loop on x64.
> > > 
> > > From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and
> > > tailcall
> > > handling in JIT"), the tailcall on x64 works better than before.
> > > 
> > > From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF
> > > subprograms
> > > for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
> > > 
> > > From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF
> > > program
> > > to other BPF programs"), BPF program is able to trace other BPF
> > > programs.
> > > 
> > > How about combining them all together?
> > > 
> > > 1. FENTRY/FEXIT on a BPF subprogram.
> > > 2. A tailcall runs in the BPF subprogram.
> > > 3. The tailcall calls the subprogram's caller.
> > > 
> > > As a result, a tailcall infinite loop comes up. And the loop
> > > would
> > > halt
> > > the machine.
> > > 
> > > As we know, in tail call context, the tail_call_cnt propagates by
> > > stack
> > > and rax register between BPF subprograms. So do in trampolines.
> > > 
> > > How did I discover the bug?
> > > 
> > > From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth
> > > 256
> > > for
> > > subprogs with tailcalls"), the total stack size limits to around
> > > 8KiB.
> > > Then, I write some bpf progs to validate the stack consuming,
> > > that
> > > are
> > > tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on
> > > bpf2bpf[1].
> > > 
> > > At that time, accidently, I made a tailcall loop. And then the
> > > loop
> > > halted
> > > my VM. Without the loop, the bpf progs would consume over 8KiB
> > > stack
> > > size.
> > > But the _stack-overflow_ did not halt my VM.
> > > 
> > > With bpf_printk(), I confirmed that the tailcall count limit did
> > > not
> > > work
> > > expectedly. Next, read the code and fix it.
> > > 
> > > Finally, unfortunately, I only fix it on x64 but other arches. As
> > > a
> > > result, CI tests failed because this bug hasn't been fixed on
> > > s390x.
> > > 
> > > Some helps on s390x are requested.
> > 
> > I will take a look, thanks for letting me know.
> 
> Great.
> 
> > I noticed there was something peculiar in this area when
> > implementing
> > the trampoline:
> > 
> >          * Note 1: The callee can increment the tail call counter,
> > but
> >          * we do not load it back, since the x86 JIT does not do
> > this
> >          * either.>
> > but I thought that this was intentional.
> 
> I do think so.
> 
> But wait, should we load it back?
> 
> Let me show a demo:
> 
> struct {
>         __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
>         __uint(max_entries, 4);
>         __uint(key_size, sizeof(__u32));
>         __uint(value_size, sizeof(__u32));
> } jmp_table SEC(".maps");
> 
> static __noinline
> int subprog_tail_01(struct __sk_buff *skb)
> {
>         if (load_byte(skb, 0))
>                 bpf_tail_call_static(skb, &jmp_table, 1);
>         else
>                 bpf_tail_call_static(skb, &jmp_table, 0);
>         return 1;
> }
> 
> static __noinline
> int subprog_tail_23(struct __sk_buff *skb)
> {
>         if (load_byte(skb, 0))
>                 bpf_tail_call_static(skb, &jmp_table, 3);
>         else
>                 bpf_tail_call_static(skb, &jmp_table, 2);
>         return 1;
> }
> 
> int count0 = 0;
> 
> SEC("tc")
> int classifier_01(struct __sk_buff *skb)
> {
>         count0++;
>         return subprog_tail_01(skb);
> }
> 
> int count1 = 0;
> 
> SEC("tc")
> int classifier_23(struct __sk_buff *skb)
> {
>         count1++;
>         return subprog_tail_23(skb);
> }
> 
> static __noinline
> int subprog_tailcall(struct __sk_buff *skb, int index)
> {
>         volatile int retval = 0;
>         bpf_tail_call(skb, &jmp_table, index);
>         return retval;
> }
> 
> SEC("tc")
> int entry(struct __sk_buff *skb)
> {
>         subprog_tailcall(skb, 0);
>         subprog_tailcall(skb, 2);
> 
>         return 0;
> }
> 
> Finally, count0 is 33. And count1 is 33, too. It breaks the
> MAX_TAIL_CALL_CNT limit by the way tailcall in subprog.

Thanks for coming up with an example, I could not create something like
this back then and just left a note for the future.

> From 04fd61ab36ec065e ("bpf: allow bpf programs to tail-call other
> bpf
> programs"):
> The chain of tail calls can form unpredictable dynamic loops
> therefore
> tail_call_cnt is used to limit the number of calls and currently is
> set
> to 32.
> 
> It seems like that MAX_TAIL_CALL_CNT limits the max tailcall
> hierarchy
> layers.
> 
> So, should we load it back?

I wonder if the current behavior can lead to high system load in some
cases. The current limit on the number of instructions is 1M; with tail
calls it goes up to MAX_TAIL_CALL_CNT * 1M. If one uses the technique
above to the fullest extent, i.e., on each tail call hierarchy level,
can't one execute (2 ** MAX_TAIL_CALL_CNT) * 1M instructions as a
result?

> Thanks,
> Leon