On Mon, Sep 9, 2024 at 7:42 AM Leon Hwang <leon.hwang@xxxxxxxxx> wrote: > > > > On 9/9/24 20:08, Xu Kuohai wrote: > > On 9/9/2024 6:38 PM, Leon Hwang wrote: > >> > >> > >> On 9/9/24 17:02, Xu Kuohai wrote: > >>> On 9/8/2024 9:01 PM, Leon Hwang wrote: > >>>> > >>>> > >>>> On 1/9/24 21:38, Leon Hwang wrote: > >>>>> Like "bpf, x64: Fix tailcall infinite loop caused by freplace", the > >>>>> same > >>>>> issue happens on arm64, too. > >>>>> > > [...] > > >>>>> > >>>> Hi Puranjay and Kuohai, > >>>> > >>>> As it's not recommended to introduce arch_bpf_run(), this is my > >>>> approach > >>>> to fix the niche case on arm64. > >>>> > >>>> Do you have any better idea to fix it? > >>>> > >>> > >>> IIUC, the recommended appraoch is to teach verifier to reject the > >>> freplace + tailcall combination. If this combiation is allowed, we > >>> will face more than just this issue. For example, what happens if > >>> a freplace prog is attached to tail callee? The freplace prog is not > >>> reachable through the tail call, right? > >>> > >> > >> It's to reject the freplace + tailcall combination partially, see "bpf, > >> x64: Fix tailcall infinite loop caused by freplace". (Oh, I should > >> separate the rejection to a standalone patch.) > >> It rejects the case that freplace prog has tailcall and its attach > >> target has no tailcall. > >> > >> As for your example, it depends on: > >> > >> freplace target reject? > >> Has tailcall? YES NO YES > >> Has tailcall? YES YES NO > >> Has tailcall? NO YES NO > >> Has tailcall? NO YES NO > >> > >> Then, freplace prog can be tail callee always. I haven't seen any bad > >> case when freplace prog is tail callee. > >> > > > > Here is a concrete case. prog1 tail calls prog2, and prog2_new is > > attached to prog2 via freplace. > > > > SEC("tc") > > int prog1(struct __sk_buff *skb) > > { > > bpf_tail_call_static(skb, &progs, 0); // tail call prog2 > > return 0; > > } > > > > SEC("tc") > > int prog2(struct __sk_buff *skb) > > { > > return 0; > > } > > > > SEC("freplace") > > int prog2_new(struct __sk_buff *skb) // target is prog2 > > { > > return 0; > > } > > > > In this case, prog2_new is not reachable, since the tail call > > target in prog2 is start address of prog2 + TAIL_CALL_OFFSET, > > which locates behind freplace/fentry callsite of prog2. > > > > This is an abnormal use case. We can do nothing with it, e.g. we're > unable to notify user that prog2_new is not reachable for this case. Since it doesn't behave as the user would expect, I think, it's better to disallow such combinations in the verifier. Either freplace is trying to patch a prog that is already in prog_array then the load of freplace prog can report a meaningful error into the verifier log or already freplaced prog is being added to prog array. I think in this case main prog tail calling into this freplace prog will actually succeed, but it's better to reject sys_bpf update command in bpf_fd_array_map_update_elem(), since being-freplaced prog is not in prog_array and won't be called, but freplace prog in prog array can be called which is inconsistent. freplace prog should act and be called just like target being-freplaced prog. I don't think this will break any actual use cases where freplace and tail call are used together.