On 2024/5/17 02:56, Zvi Effron wrote: > On Thu, May 16, 2024 at 8:28 AM Leon Hwang <hffilwlqm@xxxxxxxxx> wrote: >> >> >> >> On 2024/5/9 23:05, Leon Hwang wrote: >>> This patch fixes a tailcall issue caused by abusing the tailcall in >>> bpf2bpf feature. >>> [SNIP] >>> >> >> Oh, I missed a case here. >> >> This patch set is unable to provide tcc_ptr for freplace programs that >> use tail calls in bpf2bpf. >> >> How can this approach provide tcc_ptr for freplace programs? >> >> Achieving this is not straightforward. However, it is simpler to disable >> the use of tail calls in bpf2bpf for freplace programs, even though this >> is a desired feature for my project. >> >> Therefore, I will disable it in the v5 patch set. >> > > Isn't this a breaking change such that it would effectively be a regression for > any users already using tail_calls in bpf2bpf for freplace programs? And, > correct me if I'm wrong, but aren't those pieces of eBPF essentially considered > UAPI stable (unlike kfuncs)? Yeah, this is a breaking change. However, I think it's acceptable, as tail_calls in subprogs was considered to be disabled[0]. [0] https://lore.kernel.org/bpf/CAADnVQLOswL3BY1s0B28wRZH1PU675S6_2=XknjZKNgyJ=yDxw@xxxxxxxxxxxxxx/ > > I appreciate that this is an esoteric use of eBPF, but as you said, you have a > use case for it, as does my team (although we haven't had a chance to implement > it yet), and if the two of us have use cases for it, I imagine other may have > as well, and some of them might already have done their implementation. > It seems it is an useful feature for us. I haven't use it either because of old kernel version. So, I figure out another approach to resolve this issue. Here's the diff just for idea discussion: diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 5159c7a229229..b0b6c84874e54 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -273,7 +273,7 @@ struct jit_context { /* Number of bytes emit_patch() needs to generate instructions */ #define X86_PATCH_SIZE 5 /* Number of bytes that will be skipped on tailcall */ -#define X86_TAIL_CALL_OFFSET (11 + ENDBR_INSN_SIZE) +#define X86_TAIL_CALL_OFFSET (22 + ENDBR_INSN_SIZE) static void push_r12(u8 **pprog) { @@ -403,6 +403,22 @@ static void emit_cfi(u8 **pprog, u32 hash) *pprog = prog; } +static notrace void bpf_prepare_tail_call_cnt_ptr() +{ + /* %rax stores the position to call the original prog. */ + + asm ( + "pushq %r9\n\t" /* Push %r9. */ + "movq %rax, %r9\n\t" /* Cache calling position. */ + "xor %eax, %eax\n\t" /* Initialise tail_call_cnt. */ + "pushq %rax\n\t" /* Push tail_call_cnt. */ + "movq %rsp, %rax\n\t" /* Make %rax as tcc_ptr. */ + "callq *%r9\n\t" /* Call the original prog. */ + "popq %r9\n\t" /* Pop tail_call_cnt. */ + "popq %r9\n\t" /* Pop %r9. */ + ); +} + /* * Emit x86-64 prologue code for BPF program. * bpf_tail_call helper will skip the first X86_TAIL_CALL_OFFSET bytes @@ -410,9 +426,9 @@ static void emit_cfi(u8 **pprog, u32 hash) */ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf, bool tail_call_reachable, bool is_subprog, - bool is_exception_cb) + bool is_exception_cb, u8 *image) { - u8 *prog = *pprog; + u8 *prog = *pprog, *start = *pprog; emit_cfi(&prog, is_subprog ? cfi_bpf_subprog_hash : cfi_bpf_hash); /* BPF trampoline can be made to work without these nops, @@ -420,14 +436,16 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf, */ emit_nops(&prog, X86_PATCH_SIZE); if (!ebpf_from_cbpf) { - if (tail_call_reachable && !is_subprog) - /* When it's the entry of the whole tailcall context, - * zeroing rax means initialising tail_call_cnt. - */ - EMIT2(0x31, 0xC0); /* xor eax, eax */ - else + if (tail_call_reachable && !is_subprog) { + /* mov rax, offset */ + u32 offset = image + (prog - start) + 13; + EMIT4_off32(0x48, 0x8B, 0x04, 0x25, offset); + /* call bpf_prepare_tail_call_cnt_ptr */ + emit_call(&prog, bpf_prepare_tail_call_cnt_ptr, offset-5); + } else { /* Keep the same instruction layout. */ - EMIT2(0x66, 0x90); /* nop2 */ + emit_nops(&prog, 13); + } } /* Exception callback receives FP as third parameter */ if (is_exception_cb) { @@ -1344,7 +1362,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image emit_prologue(&prog, bpf_prog->aux->stack_depth, bpf_prog_was_classic(bpf_prog), tail_call_reachable, - bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb); + bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb, + image); /* Exception callback will clobber callee regs for its own use, and * restore the original callee regs from main prog's stack frame. */ Unlink the way to prepare tcc_ptr of current patch set by bpf prog's caller, it prepares tcc_ptr by calling a function at prologue to reserve tail_call_cnt memory on stack. And then, call the remain part of the bpf prog. At the end of prologue, rax is tcc_ptr, too. This is inspired by the original RFC PATCH[0]. And then, it avoids unwind-breaking issue by a real function call. [0] https://lore.kernel.org/bpf/20240104142226.87869-3-hffilwlqm@xxxxxxxxx/ However, it introduces an indirect call in bpf_prepare_tail_call_cnt_ptr(), which costs performance by retpoline. If to improve performance here, bpf dispatcher should be considered, like XDP. Thanks, Leon