On 9/14/2024 1:47 AM, Alexei Starovoitov wrote:
On Mon, Sep 9, 2024 at 7:42 AM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
On 9/9/24 20:08, Xu Kuohai wrote:
On 9/9/2024 6:38 PM, Leon Hwang wrote:
On 9/9/24 17:02, Xu Kuohai wrote:
On 9/8/2024 9:01 PM, Leon Hwang wrote:
On 1/9/24 21:38, Leon Hwang wrote:
Like "bpf, x64: Fix tailcall infinite loop caused by freplace", the
same
issue happens on arm64, too.
[...]
Hi Puranjay and Kuohai,
As it's not recommended to introduce arch_bpf_run(), this is my
approach
to fix the niche case on arm64.
Do you have any better idea to fix it?
IIUC, the recommended appraoch is to teach verifier to reject the
freplace + tailcall combination. If this combiation is allowed, we
will face more than just this issue. For example, what happens if
a freplace prog is attached to tail callee? The freplace prog is not
reachable through the tail call, right?
It's to reject the freplace + tailcall combination partially, see "bpf,
x64: Fix tailcall infinite loop caused by freplace". (Oh, I should
separate the rejection to a standalone patch.)
It rejects the case that freplace prog has tailcall and its attach
target has no tailcall.
As for your example, it depends on:
freplace target reject?
Has tailcall? YES NO YES
Has tailcall? YES YES NO
Has tailcall? NO YES NO
Has tailcall? NO YES NO
Then, freplace prog can be tail callee always. I haven't seen any bad
case when freplace prog is tail callee.
Here is a concrete case. prog1 tail calls prog2, and prog2_new is
attached to prog2 via freplace.
SEC("tc")
int prog1(struct __sk_buff *skb)
{
bpf_tail_call_static(skb, &progs, 0); // tail call prog2
return 0;
}
SEC("tc")
int prog2(struct __sk_buff *skb)
{
return 0;
}
SEC("freplace")
int prog2_new(struct __sk_buff *skb) // target is prog2
{
return 0;
}
In this case, prog2_new is not reachable, since the tail call
target in prog2 is start address of prog2 + TAIL_CALL_OFFSET,
which locates behind freplace/fentry callsite of prog2.
This is an abnormal use case. We can do nothing with it, e.g. we're
unable to notify user that prog2_new is not reachable for this case.
Since it doesn't behave as the user would expect, I think, it's better
to disallow such combinations in the verifier.
Either freplace is trying to patch a prog that is already in prog_array
then the load of freplace prog can report a meaningful error into the
verifier log or
already freplaced prog is being added to prog array.
I think in this case main prog tail calling into this freplace prog
will actually succeed, but it's better to reject sys_bpf update
command in bpf_fd_array_map_update_elem(),
since being-freplaced prog is not in prog_array and won't be called,
but freplace prog in prog array can be called which is inconsistent.
freplace prog should act and be called just like target being-freplaced prog.
I don't think this will break any actual use cases where freplace and
tail call are used together.
+1, we should not ignore it silently. If the freplace + tailcall
combination can not be disallowed completely, we should disallow
this case separately (or maybe we could find an acceptable way to
make it work as expected?) . Let me give it a try.