On Mon, Nov 4, 2024 at 7:37 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > On 11/4/24 6:51 PM, Alexei Starovoitov wrote: > > On Mon, Nov 4, 2024 at 11:38 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > >> In previous patch, tracing progs are enabled for private stack since > >> recursion checking ensures there exists no nested same bpf prog run on > >> the same cpu. > >> > >> But it is still possible for nested bpf subprog run on the same cpu > >> if the same subprog is called in both main prog and async callback, > >> or in different async callbacks. For example, > >> main_prog > >> bpf_timer_set_callback(timer, timer_cb); > >> call sub1 > >> sub1 > >> ... > >> time_cb > >> call sub1 > >> > >> In the above case, nested subprog run for sub1 is possible with one in > >> process context and the other in softirq context. If this is the case, > >> the verifier will disable private stack for this bpf prog. > >> > >> Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx> > >> --- > >> include/linux/bpf_verifier.h | 2 ++ > >> kernel/bpf/verifier.c | 42 +++++++++++++++++++++++++++++++----- > >> 2 files changed, 39 insertions(+), 5 deletions(-) > >> > >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > >> index 0622c11a7e19..e921589abc72 100644 > >> --- a/include/linux/bpf_verifier.h > >> +++ b/include/linux/bpf_verifier.h > >> @@ -669,6 +669,8 @@ struct bpf_subprog_info { > >> /* true if bpf_fastcall stack region is used by functions that can't be inlined */ > >> bool keep_fastcall_stack: 1; > >> bool use_priv_stack: 1; > >> + bool visited_with_priv_stack_accum: 1; > >> + bool visited_with_priv_stack: 1; > >> > >> u8 arg_cnt; > >> struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS]; > >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >> index 406195c433ea..e01b3f0fd314 100644 > >> --- a/kernel/bpf/verifier.c > >> +++ b/kernel/bpf/verifier.c > >> @@ -6118,8 +6118,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx, > >> idx, subprog_depth); > >> return -EACCES; > >> } > >> - if (subprog_depth >= BPF_PRIV_STACK_MIN_SIZE) > >> + if (subprog_depth >= BPF_PRIV_STACK_MIN_SIZE) { > >> subprog[idx].use_priv_stack = true; > >> + subprog[idx].visited_with_priv_stack = true; > >> + } > >> + } else { > >> + subprog[idx].visited_with_priv_stack = true; > > See suggestion for patch 3. > > It's cleaner to rewrite with a single visited_with_priv_stack = true; statement. > > Ack. > > > > >> } > >> } > >> continue_func: > >> @@ -6220,10 +6224,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx, > >> static int check_max_stack_depth(struct bpf_verifier_env *env) > >> { > >> struct bpf_subprog_info *si = env->subprog_info; > >> + enum priv_stack_mode orig_priv_stack_supported; > >> enum priv_stack_mode priv_stack_supported; > >> int ret, subtree_depth = 0, depth_frame; > >> > >> priv_stack_supported = bpf_enable_priv_stack(env->prog); > >> + orig_priv_stack_supported = priv_stack_supported; > >> > >> if (priv_stack_supported != NO_PRIV_STACK) { > >> for (int i = 0; i < env->subprog_cnt; i++) { > >> @@ -6240,13 +6246,39 @@ static int check_max_stack_depth(struct bpf_verifier_env *env) > >> priv_stack_supported); > >> if (ret < 0) > >> return ret; > >> + > >> + if (priv_stack_supported != NO_PRIV_STACK) { > >> + for (int j = 0; j < env->subprog_cnt; j++) { > >> + if (si[j].visited_with_priv_stack_accum && > >> + si[j].visited_with_priv_stack) { > >> + /* si[j] is visited by both main/async subprog > >> + * and another async subprog. > >> + */ > >> + priv_stack_supported = NO_PRIV_STACK; > >> + break; > >> + } > >> + if (!si[j].visited_with_priv_stack_accum) > >> + si[j].visited_with_priv_stack_accum = > >> + si[j].visited_with_priv_stack; > >> + } > >> + } > >> + if (priv_stack_supported != NO_PRIV_STACK) { > >> + for (int j = 0; j < env->subprog_cnt; j++) > >> + si[j].visited_with_priv_stack = false; > >> + } > > I cannot understand what this algorithm is doing. > > What is the meaning of visited_with_priv_stack_accum ? > > The following is an example to show how the algorithm works. > Let us say we have prog like > main_prog0 si[0] > sub1 si[1] > sub2 si[2] > async1 si[3] > sub4 si[4] > sub2 si[2] > async2 si[5] > sub4 si[4] > sub5 si[6] > > > Total 9 subprograms. > > after iteration 1 (main_prog0) > visited_with_priv_stack_accum: si[i] = false for i = 0 ... 9 > visited_with_priv_stack: si[0] = si[1] = si[2] = true, others false > > for all i, visited_with_priv_stack_accum[i] and visited_with_priv_stack[i] > is false, so main_prog0 can use priv stack. > > visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false > visited_with_priv_stack cleared with false. > > after iteration 2 (async1) > visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false > visited_with_priv_stack: si[2] = si[3] = si[4] = true, others false > > Here, si[2] appears in both visited_with_priv_stack_accum and > visited_with_priv_stack, so async1 cannot have priv stack. > > In my algorithm, I flipped the whole thing to no_priv_stack, which is > too conservative. We should just skip async1 and continues. > > Let us say, we say async1 not having priv stack while main_prog0 has. > > /* the same as end of iteration 1 */ > visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false > visited_with_priv_stack cleared with false. > > after iteration 3 (async2) > visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false > visited_with_priv_stack: si[4] = si[5] = si[6] = true; > > there are no conflict, so async2 can use private stack. > > > If we only have one bit in bpf_subprog_info, for a async tree, > if marking a subprog to be true and later we found there is a conflict in > async tree and we need make the whole async subprogs not eligible for priv stack, > then it will be hard to undo previous markings. > > So visited_with_priv_stack_accum is to accumulate "true" results from > main_prog/async's. I see. I think it works, but feels complicated. It feels it should be possible to do without extra flags. Like check_max_stack_depth_subprog() will know whether it was called to verify async_cb or not. So it's just a matter of adding single 'if' to it: if (subprog[idx].use_priv_stack && checking_async_cb) /* reset to false due to potential recursion */ subprog[idx].use_priv_stack = false; check_max_stack_depth() starts with i==0, so reachable and eligible subprogs will be marked with use_priv_stack. Then check_max_stack_depth_subprog() will be called again to verify async. If it sees the mark it's a bad case. what am I missing?