Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls

Björn Töpel <bjorn@xxxxxxxxxx> · Tue, 30 Jan 2024 09:29:46 +0100

Pu Lehui <pulehui@xxxxxxxxxxxxxxx> writes:

> On 2023/9/28 17:59, Björn Töpel wrote:
>> Pu Lehui <pulehui@xxxxxxxxxxxxxxx> writes:
>> 
>>> From: Pu Lehui <pulehui@xxxxxxxxxx>
>>>
>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>> the TCC can be propagated from the parent process to the subprocess, but
>>> the TCC of the parent process cannot be restored when the subprocess
>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>> registers into the stack, we cannot use the callee saved register to
>>> pass the TCC, otherwise the original value of the callee saved register
>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>> the TCC between functions, and saving that register to the stack to
>>> protect the TCC value. At the same time, we also consider the scenario
>>> of mixing trampoline.
>> 
>> Hi!
>> 
>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>> fixed pro/epilogue like some of the other JITs. I think we can do better
>> here, so that the pass-TCC-via-register can be used, and the additional
>> stack access can be avoided.
>> 
>> Today, the TCC is passed via a register (a6) and can be viewed as a
>> "state" variable/transparent argument/return value. As you point out, we
>> loose this when we do a call. On (any) calls we move the TCC to a
>> callee-saved register.
>> 
>> WDYT about the following scheme:
>> 
>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>    for the main program.
>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>    a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>> 3 For all other calls, a6 is passed transparently.
>> 
>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>> a BPF helper or not.
>> 
>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>> store/loads.
>> 
>
> Valuable scheme. But we need to consider TCC back propagation. Let me 
> show an example of calling subprog with TCC stored in A6:
>
> prog1(TCC==1){
>      subprog1(TCC==1)
>          -> tailcall1(TCC==0)
>              -> subprog2(TCC==0)
>      subprog3(TCC==0) <--- should be TCC==1
>          -\-> tailcall2 <--- can't be called
> }
>
> We call prog1 and TCC is 1. prog1 has two subprogs, subprog1 and 
> subprog3. subprog1 calls tailcall1 and TCC become to 0. tailcall1 call 
> subprog2 and then return to prog1 with TCC is 0. At this time, subprog3 
> cannot call tailcall2 because TCC is 0. But TCC should be 1 here.

Huh, I'm not following, and I don't see the issue. Help me out! You're
only allowed to do X tail calls "globally" for a BPF context, right? So
in the example you're outlining above, tailcall2 shouldn't be allowed to
be called.

Björn