On Mon, Nov 4, 2024 at 7:50 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > On 11/4/24 6:53 PM, Yonghong Song wrote: > > > > On 11/4/24 5:55 PM, Alexei Starovoitov wrote: > >> On Mon, Nov 4, 2024 at 5:35 PM Yonghong Song > >> <yonghong.song@xxxxxxxxx> wrote: > >>> > >>> On 11/4/24 5:21 PM, Alexei Starovoitov wrote: > >>>> On Mon, Nov 4, 2024 at 11:35 AM Yonghong Song > >>>> <yonghong.song@xxxxxxxxx> wrote: > >>>>> The bpf_prog_check_recur() funciton is currently used by trampoline > >>>>> and tracing programs (also using trampoline) to check whether a > >>>>> particular prog supports recursion checking or not. The default case > >>>>> (non-trampoline progs) return true in the current implementation. > >>>>> > >>>>> Let us make the non-trampoline prog recursion check return false > >>>>> instead. It does not impact any existing use cases and allows the > >>>>> function to be used outside the trampoline context in the next patch. > >>>> Does not impact ?! But it does. > >>>> This patch removes recursion check from fentry progs. > >>>> This cannot be right. > >>> The original bpf_prog_check_recur() implementation: > >>> > >>> static inline bool bpf_prog_check_recur(const struct bpf_prog *prog) > >>> { > >>> switch (resolve_prog_type(prog)) { > >>> case BPF_PROG_TYPE_TRACING: > >>> return prog->expected_attach_type != BPF_TRACE_ITER; > >>> case BPF_PROG_TYPE_STRUCT_OPS: > >>> case BPF_PROG_TYPE_LSM: > >>> return false; > >>> default: > >>> return true; > >>> } > >>> } > >>> > >>> fentry prog is a TRACING prog, so it is covered. Did I miss anything? > >> I see. This is way too subtle. > >> You're correct that fentry is TYPE_TRACING, > >> so it could have "worked" if it was used to build trampolines only. > >> > >> But this helper is called for other prog types: > >> > >> case BPF_FUNC_task_storage_get: > >> if (bpf_prog_check_recur(prog)) > >> return &bpf_task_storage_get_recur_proto; > >> return &bpf_task_storage_get_proto; > >> > >> so it's still not correct, but for a different reason. > > > > There are four uses for func bpf_prog_check_recur() in kernel based on > > cscope: 0 kernel/bpf/trampoline.c bpf_trampoline_enter 1053 if > > (bpf_prog_check_recur(prog)) 1 kernel/bpf/trampoline.c > > bpf_trampoline_exit 1068 if (bpf_prog_check_recur(prog)) 2 > > kernel/trace/bpf_trace.c bpf_tracing_func_proto 1549 if > > (bpf_prog_check_recur(prog)) 3 kernel/trace/bpf_trace.c > > bpf_tracing_func_proto 1553 if (bpf_prog_check_recur(prog)) The 2nd > > and 3rd ones are in bpf_trace.c. 1444 static const struct > > bpf_func_proto * 1445 bpf_tracing_func_proto(enum bpf_func_id func_id, > > const struct bpf_prog *prog) 1446 { 1447 switch (func_id) { ... 1548 > > case BPF_FUNC_task_storage_get: 1549 if (bpf_prog_check_recur(prog)) > > 1550 return &bpf_task_storage_get_recur_proto; 1551 return > > &bpf_task_storage_get_proto; 1552 case BPF_FUNC_task_storage_delete: > > 1553 if (bpf_prog_check_recur(prog)) 1554 return > > &bpf_task_storage_delete_recur_proto; 1555 return > > &bpf_task_storage_delete_proto; ... 1568 default: 1569 return > > bpf_base_func_proto(func_id, prog); 1570 } 1571 } They are used for > > tracing programs. So we should be safe here. But if you think that > > changing bpf_proc_check_recur() and calling function > > bpf_prog_check_recur() in bpf_enable_priv_stack() is too subtle, I can > > go back to my original approach which makes all supported prog types > > explicit in bpf_enable_priv_stack(). > > Sorry. Format issue again. The below is a better format: > > There are four uses for func bpf_prog_check_recur() in kernel based on cscope: > > 0 kernel/bpf/trampoline.c bpf_trampoline_enter 1053 if (bpf_prog_check_recur(prog)) > 1 kernel/bpf/trampoline.c bpf_trampoline_exit 1068 if (bpf_prog_check_recur(prog)) > 2 kernel/trace/bpf_trace.c bpf_tracing_func_proto 1549 if (bpf_prog_check_recur(prog)) > 3 kernel/trace/bpf_trace.c bpf_tracing_func_proto 1553 if (bpf_prog_check_recur(prog)) > > The 2nd and 3rd ones are in bpf_trace.c. > > 1444 static const struct bpf_func_proto * > 1445 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > 1446 { > 1447 switch (func_id) { > ... > 1548 case BPF_FUNC_task_storage_get: > 1549 if (bpf_prog_check_recur(prog)) > 1550 return &bpf_task_storage_get_recur_proto; > 1551 return &bpf_task_storage_get_proto; > 1552 case BPF_FUNC_task_storage_delete: > 1553 if (bpf_prog_check_recur(prog)) > 1554 return &bpf_task_storage_delete_recur_proto; > 1555 return &bpf_task_storage_delete_proto; > ... > 1568 default: > 1569 return bpf_base_func_proto(func_id, prog); > 1570 } > 1571 } > > They are used for tracing programs. So we should be safe here. But if you think that > changing bpf_proc_check_recur() and calling function bpf_prog_check_recur() > in bpf_enable_priv_stack() is too subtle, I can go back to my original approach > which makes all supported prog types explicit in bpf_enable_priv_stack(). What do you mean 'it's safe' ? If you change bpf_prog_check_recur() to return false like this patch does then kprobe progs will not have recursion protection calling task_storage_get() helper. In the context of this helper it means that kprobe progs have to use: nobusy = bpf_task_storage_trylock(); With this patch as-is there will be a deadlock in bpf_task_storage_lock() when kprobe is using task storage. So it looks broken to me. I also don't understand the point of this patch 2. The patch 3 can still do: + switch (prog->type) { + case BPF_PROG_TYPE_KPROBE: + case BPF_PROG_TYPE_TRACEPOINT: + case BPF_PROG_TYPE_PERF_EVENT: + case BPF_PROG_TYPE_RAW_TRACEPOINT: + return PRIV_STACK_ADAPTIVE; + default: + break; + } + + if (!bpf_prog_check_recur(prog)) + return NO_PRIV_STACK; which would mean that iter, lsm, struct_ops will not be allowed to use priv stack. Unless struct_ops will explicit request priv stack via bool flag. Then we will also add recursion protection in trampoline.