Pu Lehui <pulehui@xxxxxxxxxxxxxxx> writes: > On 2023/9/28 17:59, Björn Töpel wrote: >> Pu Lehui <pulehui@xxxxxxxxxxxxxxx> writes: >> >>> From: Pu Lehui <pulehui@xxxxxxxxxx> >>> >>> In the current RV64 JIT, if we just don't initialize the TCC in subprog, >>> the TCC can be propagated from the parent process to the subprocess, but >>> the TCC of the parent process cannot be restored when the subprocess >>> exits. Since the RV64 TCC is initialized before saving the callee saved >>> registers into the stack, we cannot use the callee saved register to >>> pass the TCC, otherwise the original value of the callee saved register >>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls >>> similar to x86_64, i.e. using a non-callee saved register to transfer >>> the TCC between functions, and saving that register to the stack to >>> protect the TCC value. At the same time, we also consider the scenario >>> of mixing trampoline. >> >> Hi! >> >> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a >> fixed pro/epilogue like some of the other JITs. I think we can do better >> here, so that the pass-TCC-via-register can be used, and the additional >> stack access can be avoided. >> >> Today, the TCC is passed via a register (a6) and can be viewed as a >> "state" variable/transparent argument/return value. As you point out, we >> loose this when we do a call. On (any) calls we move the TCC to a >> callee-saved register. >> >> WDYT about the following scheme: >> >> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC >> for the main program. >> 2 For BPF helper calls, move TCC to s6, perform the call, and restore >> a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL). >> 3 For all other calls, a6 is passed transparently. >> >> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is >> a BPF helper or not. >> >> In summary; Determine in the JIT if we're leaving BPF-land, and need to >> move the TCC to a callee-saved reg, or not, and save us a bunch of stack >> store/loads. >> > > Valuable scheme. But we need to consider TCC back propagation. Let me > show an example of calling subprog with TCC stored in A6: > > prog1(TCC==1){ > subprog1(TCC==1) > -> tailcall1(TCC==0) > -> subprog2(TCC==0) > subprog3(TCC==0) <--- should be TCC==1 > -\-> tailcall2 <--- can't be called > } > > We call prog1 and TCC is 1. prog1 has two subprogs, subprog1 and > subprog3. subprog1 calls tailcall1 and TCC become to 0. tailcall1 call > subprog2 and then return to prog1 with TCC is 0. At this time, subprog3 > cannot call tailcall2 because TCC is 0. But TCC should be 1 here. Huh, I'm not following, and I don't see the issue. Help me out! You're only allowed to do X tail calls "globally" for a BPF context, right? So in the example you're outlining above, tailcall2 shouldn't be allowed to be called. Björn