Re: [RFC PATCH bpf-next v4 0/4] bpf, x64: Fix tailcall infinite loop

Leon Hwang <hffilwlqm@xxxxxxxxx> · Wed, 6 Sep 2023 10:39:59 +0800

On 6/9/23 08:57, Ilya Leoshkevich wrote:
> On Mon, 2023-09-04 at 23:15 +0800, Leon Hwang wrote:
>>
>>
>> On 2023/9/4 21:10, Ilya Leoshkevich wrote:
>>> On Sun, 2023-09-03 at 23:14 +0800, Leon Hwang wrote:
>>>> This patch series fixes a tailcall infinite loop on x64.
>>>>
>>>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and
>>>> tailcall
>>>> handling in JIT"), the tailcall on x64 works better than before.
>>>>
>>>> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF
>>>> subprograms
>>>> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
>>>>
>>>> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF
>>>> program
>>>> to other BPF programs"), BPF program is able to trace other BPF
>>>> programs.
>>>>
>>>> How about combining them all together?
>>>>
>>>> 1. FENTRY/FEXIT on a BPF subprogram.
>>>> 2. A tailcall runs in the BPF subprogram.
>>>> 3. The tailcall calls the subprogram's caller.
>>>>
>>>> As a result, a tailcall infinite loop comes up. And the loop
>>>> would
>>>> halt
>>>> the machine.
>>>>
>>>> As we know, in tail call context, the tail_call_cnt propagates by
>>>> stack
>>>> and rax register between BPF subprograms. So do in trampolines.
>>>>
>>>> How did I discover the bug?
>>>>
>>>> From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth
>>>> 256
>>>> for
>>>> subprogs with tailcalls"), the total stack size limits to around
>>>> 8KiB.
>>>> Then, I write some bpf progs to validate the stack consuming,
>>>> that
>>>> are
>>>> tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on
>>>> bpf2bpf[1].
>>>>
>>>> At that time, accidently, I made a tailcall loop. And then the
>>>> loop
>>>> halted
>>>> my VM. Without the loop, the bpf progs would consume over 8KiB
>>>> stack
>>>> size.
>>>> But the _stack-overflow_ did not halt my VM.
>>>>
>>>> With bpf_printk(), I confirmed that the tailcall count limit did
>>>> not
>>>> work
>>>> expectedly. Next, read the code and fix it.
>>>>
>>>> Finally, unfortunately, I only fix it on x64 but other arches. As
>>>> a
>>>> result, CI tests failed because this bug hasn't been fixed on
>>>> s390x.
>>>>
>>>> Some helps on s390x are requested.
>>>
>>> I will take a look, thanks for letting me know.
>>
>> Great.
>>
>>> I noticed there was something peculiar in this area when
>>> implementing
>>> the trampoline:
>>>
>>>          * Note 1: The callee can increment the tail call counter,
>>> but
>>>          * we do not load it back, since the x86 JIT does not do
>>> this
>>>          * either.>
>>> but I thought that this was intentional.
>>
>> I do think so.
>>
>> But wait, should we load it back?
>>
>> Let me show a demo:
>>
>> struct {
>>         __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
>>         __uint(max_entries, 4);
>>         __uint(key_size, sizeof(__u32));
>>         __uint(value_size, sizeof(__u32));
>> } jmp_table SEC(".maps");
>>
>> static __noinline
>> int subprog_tail_01(struct __sk_buff *skb)
>> {
>>         if (load_byte(skb, 0))
>>                 bpf_tail_call_static(skb, &jmp_table, 1);
>>         else
>>                 bpf_tail_call_static(skb, &jmp_table, 0);
>>         return 1;
>> }
>>
>> static __noinline
>> int subprog_tail_23(struct __sk_buff *skb)
>> {
>>         if (load_byte(skb, 0))
>>                 bpf_tail_call_static(skb, &jmp_table, 3);
>>         else
>>                 bpf_tail_call_static(skb, &jmp_table, 2);
>>         return 1;
>> }
>>
>> int count0 = 0;
>>
>> SEC("tc")
>> int classifier_01(struct __sk_buff *skb)
>> {
>>         count0++;
>>         return subprog_tail_01(skb);
>> }
>>
>> int count1 = 0;
>>
>> SEC("tc")
>> int classifier_23(struct __sk_buff *skb)
>> {
>>         count1++;
>>         return subprog_tail_23(skb);
>> }
>>
>> static __noinline
>> int subprog_tailcall(struct __sk_buff *skb, int index)
>> {
>>         volatile int retval = 0;
>>         bpf_tail_call(skb, &jmp_table, index);
>>         return retval;
>> }
>>
>> SEC("tc")
>> int entry(struct __sk_buff *skb)
>> {
>>         subprog_tailcall(skb, 0);
>>         subprog_tailcall(skb, 2);
>>
>>         return 0;
>> }
>>
>> Finally, count0 is 33. And count1 is 33, too. It breaks the
>> MAX_TAIL_CALL_CNT limit by the way tailcall in subprog.
> 
> Thanks for coming up with an example, I could not create something like
> this back then and just left a note for the future.
> 
>> From 04fd61ab36ec065e ("bpf: allow bpf programs to tail-call other
>> bpf
>> programs"):
>> The chain of tail calls can form unpredictable dynamic loops
>> therefore
>> tail_call_cnt is used to limit the number of calls and currently is
>> set
>> to 32.
>>
>> It seems like that MAX_TAIL_CALL_CNT limits the max tailcall
>> hierarchy
>> layers.
>>
>> So, should we load it back?
> 
> I wonder if the current behavior can lead to high system load in some
> cases. The current limit on the number of instructions is 1M; with tail
> calls it goes up to MAX_TAIL_CALL_CNT * 1M. If one uses the technique
> above to the fullest extent, i.e., on each tail call hierarchy level,
> can't one execute (2 ** MAX_TAIL_CALL_CNT) * 1M instructions as a
> result?

Executing (2 ** MAX_TAIL_CALL_CNT) * 1M instructions is really harmful.

So, we should load it back.

I'll try to fix it with another patchset.

Thanks,
Leon