On 2024/9/5 17:13, Puranjay Mohan wrote: > Xu Kuohai <xukuohai@xxxxxxxxxxxxxxx> writes: > >> On 8/27/2024 10:23 AM, Leon Hwang wrote: >>> >>> >>> On 26/8/24 22:32, Xu Kuohai wrote: >>>> On 8/25/2024 9:09 PM, Leon Hwang wrote: >>>>> Like "bpf, x64: Fix tailcall infinite loop caused by freplace", the same >>>>> issue happens on arm64, too. >>>>> >>> >>> [...] >>> >>>> >>>> This patch makes arm64 jited prologue even more complex. I've posted a >>>> series [1] >>>> to simplify the arm64 jited prologue/epilogue. I think we can fix this >>>> issue based >>>> on [1]. I'll give it a try. >>>> >>>> [1] >>>> https://lore.kernel.org/bpf/20240826071624.350108-1-xukuohai@xxxxxxxxxxxxxxx/ >>>> >>> >>> Your patch series seems great. We can fix it based on it. >>> >>> Please notify me if you have a successful try. >>> >> >> I think the complexity arises from having to decide whether >> to initialize or keep the tail counter value in the prologue. >> >> To get rid of this complexity, a straightforward idea is to >> move the tail call counter initialization to the entry of >> bpf world, and in the bpf world, we only increase and check >> the tail call counter, never save/restore or set it. The >> "entry of the bpf world" here refers to mechanisms like >> bpf_prog_run, bpf dispatcher, or bpf trampoline that >> allows bpf prog to be invoked from C function. >> >> Below is a rough POC diff for arm64 that could pass all >> of your tests. The tail call counter is held in callee-saved >> register x26, and is set to 0 by arch_run_bpf. > > I like this approach as it removes all the complexity of handling tcc in I like this approach, too. > different cases. Can we go ahead with this for arm64 and make > arch_run_bpf a weak function and let other architectures override this > if they want to use a similar approach to this and if other archs want to > do something else they can skip implementing arch_run_bpf. > Hi Alexei, What do you think about this idea? Thanks, Leon