On 9/6/2024 11:24 PM, Alexei Starovoitov wrote:
On Fri, Sep 6, 2024 at 7:32 AM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
On 2024/9/5 17:13, Puranjay Mohan wrote:
Xu Kuohai <xukuohai@xxxxxxxxxxxxxxx> writes:
On 8/27/2024 10:23 AM, Leon Hwang wrote:
On 26/8/24 22:32, Xu Kuohai wrote:
On 8/25/2024 9:09 PM, Leon Hwang wrote:
Like "bpf, x64: Fix tailcall infinite loop caused by freplace", the same
issue happens on arm64, too.
[...]
This patch makes arm64 jited prologue even more complex. I've posted a
series [1]
to simplify the arm64 jited prologue/epilogue. I think we can fix this
issue based
on [1]. I'll give it a try.
[1]
https://lore.kernel.org/bpf/20240826071624.350108-1-xukuohai@xxxxxxxxxxxxxxx/
Your patch series seems great. We can fix it based on it.
Please notify me if you have a successful try.
I think the complexity arises from having to decide whether
to initialize or keep the tail counter value in the prologue.
To get rid of this complexity, a straightforward idea is to
move the tail call counter initialization to the entry of
bpf world, and in the bpf world, we only increase and check
the tail call counter, never save/restore or set it. The
"entry of the bpf world" here refers to mechanisms like
bpf_prog_run, bpf dispatcher, or bpf trampoline that
allows bpf prog to be invoked from C function.
Below is a rough POC diff for arm64 that could pass all
of your tests. The tail call counter is held in callee-saved
register x26, and is set to 0 by arch_run_bpf.
I like this approach as it removes all the complexity of handling tcc in
I like this approach, too.
different cases. Can we go ahead with this for arm64 and make
arch_run_bpf a weak function and let other architectures override this
if they want to use a similar approach to this and if other archs want to
do something else they can skip implementing arch_run_bpf.
Hi Alexei,
What do you think about this idea?
This was discussed before and no, we're not going to add an extra tcc init
to bpf_prog_run and penalize everybody for this niche case.
+1, we should avoid hacking jit and adding complexity just for a niche case.