This patchset fixes a tailcall hierarchy issue. The issue is confirmed in the discussions of "bpf, x64: Fix tailcall infinite loop"[0]. The issue has been resolved on both x86_64 and arm64[1]. I provide a long commit message in the third patch to describe how the issue happens and how this patchset resolves the issue in details. How does this patchset resolve the issue? In short, it stores tail_call_cnt on the stack of bpf prog's caller. First, bpf prog's caller has to zeros tail_call_cnt on stack, and then to prepare tail call run ctx by wrapping the original ctx and the pointer that points to tail_call_cnt. Next, it uses tail call run ctx as first arg to call bpf prog. Then, at the prologue of bpf prog, it has to cache tail_call_cnt pointer, and to restore the original ctx as the first arg meanwhile. Furthermore, when trampoline is the caller of bpf prog, it has to prepare tail_call_cnt and tail call run ctx on its stack. v3 -> v4: * Solution changes from per-task tail_call_cnt to tailcall run ctx. As for per-cpu/per-task solution, there is a case it is unable to handle[2]. v2 -> v3: * Solution changes from percpu tail_call_cnt to tail_call_cnt at task_struct. v1 -> v2: * Solution changes from extra run-time call insn to percpu tail_call_cnt. * Address comments from Alexei: * Use percpu tail_call_cnt. * Use asm to make sure no callee saved registers are touched. RFC v2 -> v1: * Solution changes from propagating tail_call_cnt with its pointer to extra run-time call insn. * Address comments from Maciej: * Replace all memcpy(prog, x86_nops[5], X86_PATCH_SIZE) with emit_nops(&prog, X86_PATCH_SIZE) RFC v1 -> RFC v2: * Address comments from Stanislav: * Separate moving emit_nops() as first patch. Links: [0] https://lore.kernel.org/bpf/6203dd01-789d-f02c-5293-def4c1b18aef@xxxxxxxxx/ [1] https://github.com/kernel-patches/bpf/pull/6999/checks [2] https://lore.kernel.org/bpf/CAADnVQK1qF+uBjwom2s2W-yEmgd_3rGi5Nr+KiV3cW0T+UPPfA@xxxxxxxxxxxxxx/ Leon Hwang (5): bpf, verifier: Correct tail_call_reachable when no tailcall in subprog bpf: Introduce bpf_jit_supports_tail_call_cnt_ptr() bpf, x64: Fix tailcall hierarchy bpf, arm64: Fix tailcall hierarchy selftests/bpf: Add testcases for tailcall hierarchy fixing arch/arm64/net/bpf_jit_comp.c | 63 ++- arch/x86/net/bpf_jit_comp.c | 101 ++-- include/linux/bpf.h | 8 + include/linux/filter.h | 13 +- kernel/bpf/core.c | 19 + kernel/bpf/verifier.c | 2 +- .../selftests/bpf/prog_tests/tailcalls.c | 479 ++++++++++++++++++ .../bpf/progs/tailcall_bpf2bpf_hierarchy1.c | 34 ++ .../bpf/progs/tailcall_bpf2bpf_hierarchy2.c | 55 ++ .../bpf/progs/tailcall_bpf2bpf_hierarchy3.c | 46 ++ .../progs/tailcall_bpf2bpf_hierarchy_fentry.c | 35 ++ tools/testing/selftests/bpf/progs/tc_dummy.c | 12 + 12 files changed, 817 insertions(+), 50 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy1.c create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy2.c create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy3.c create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_fentry.c create mode 100644 tools/testing/selftests/bpf/progs/tc_dummy.c base-commit: 2a4c29ba6900228ed7029eb7dedf833e47338644 -- 2.44.0