On 11/4/24 6:51 PM, Alexei Starovoitov wrote:
On Mon, Nov 4, 2024 at 11:38 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
In previous patch, tracing progs are enabled for private stack since
recursion checking ensures there exists no nested same bpf prog run on
the same cpu.
But it is still possible for nested bpf subprog run on the same cpu
if the same subprog is called in both main prog and async callback,
or in different async callbacks. For example,
main_prog
bpf_timer_set_callback(timer, timer_cb);
call sub1
sub1
...
time_cb
call sub1
In the above case, nested subprog run for sub1 is possible with one in
process context and the other in softirq context. If this is the case,
the verifier will disable private stack for this bpf prog.
Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
---
include/linux/bpf_verifier.h | 2 ++
kernel/bpf/verifier.c | 42 +++++++++++++++++++++++++++++++-----
2 files changed, 39 insertions(+), 5 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 0622c11a7e19..e921589abc72 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -669,6 +669,8 @@ struct bpf_subprog_info {
/* true if bpf_fastcall stack region is used by functions that can't be inlined */
bool keep_fastcall_stack: 1;
bool use_priv_stack: 1;
+ bool visited_with_priv_stack_accum: 1;
+ bool visited_with_priv_stack: 1;
u8 arg_cnt;
struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 406195c433ea..e01b3f0fd314 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6118,8 +6118,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
idx, subprog_depth);
return -EACCES;
}
- if (subprog_depth >= BPF_PRIV_STACK_MIN_SIZE)
+ if (subprog_depth >= BPF_PRIV_STACK_MIN_SIZE) {
subprog[idx].use_priv_stack = true;
+ subprog[idx].visited_with_priv_stack = true;
+ }
+ } else {
+ subprog[idx].visited_with_priv_stack = true;
See suggestion for patch 3.
It's cleaner to rewrite with a single visited_with_priv_stack = true; statement.
Ack.
}
}
continue_func:
@@ -6220,10 +6224,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
static int check_max_stack_depth(struct bpf_verifier_env *env)
{
struct bpf_subprog_info *si = env->subprog_info;
+ enum priv_stack_mode orig_priv_stack_supported;
enum priv_stack_mode priv_stack_supported;
int ret, subtree_depth = 0, depth_frame;
priv_stack_supported = bpf_enable_priv_stack(env->prog);
+ orig_priv_stack_supported = priv_stack_supported;
if (priv_stack_supported != NO_PRIV_STACK) {
for (int i = 0; i < env->subprog_cnt; i++) {
@@ -6240,13 +6246,39 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
priv_stack_supported);
if (ret < 0)
return ret;
+
+ if (priv_stack_supported != NO_PRIV_STACK) {
+ for (int j = 0; j < env->subprog_cnt; j++) {
+ if (si[j].visited_with_priv_stack_accum &&
+ si[j].visited_with_priv_stack) {
+ /* si[j] is visited by both main/async subprog
+ * and another async subprog.
+ */
+ priv_stack_supported = NO_PRIV_STACK;
+ break;
+ }
+ if (!si[j].visited_with_priv_stack_accum)
+ si[j].visited_with_priv_stack_accum =
+ si[j].visited_with_priv_stack;
+ }
+ }
+ if (priv_stack_supported != NO_PRIV_STACK) {
+ for (int j = 0; j < env->subprog_cnt; j++)
+ si[j].visited_with_priv_stack = false;
+ }
I cannot understand what this algorithm is doing.
What is the meaning of visited_with_priv_stack_accum ?
The following is an example to show how the algorithm works.
Let us say we have prog like
main_prog0 si[0]
sub1 si[1]
sub2 si[2]
async1 si[3]
sub4 si[4]
sub2 si[2]
async2 si[5]
sub4 si[4]
sub5 si[6]
Total 9 subprograms.
after iteration 1 (main_prog0)
visited_with_priv_stack_accum: si[i] = false for i = 0 ... 9
visited_with_priv_stack: si[0] = si[1] = si[2] = true, others false
for all i, visited_with_priv_stack_accum[i] and visited_with_priv_stack[i]
is false, so main_prog0 can use priv stack.
visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
visited_with_priv_stack cleared with false.
after iteration 2 (async1)
visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
visited_with_priv_stack: si[2] = si[3] = si[4] = true, others false
Here, si[2] appears in both visited_with_priv_stack_accum and
visited_with_priv_stack, so async1 cannot have priv stack.
In my algorithm, I flipped the whole thing to no_priv_stack, which is
too conservative. We should just skip async1 and continues.
Let us say, we say async1 not having priv stack while main_prog0 has.
/* the same as end of iteration 1 */
visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
visited_with_priv_stack cleared with false.
after iteration 3 (async2)
visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
visited_with_priv_stack: si[4] = si[5] = si[6] = true;
there are no conflict, so async2 can use private stack.
If we only have one bit in bpf_subprog_info, for a async tree,
if marking a subprog to be true and later we found there is a conflict in
async tree and we need make the whole async subprogs not eligible for priv stack,
then it will be hard to undo previous markings.
So visited_with_priv_stack_accum is to accumulate "true" results from
main_prog/async's.
Maybe we change two bit names to
visited_with_priv_stack
visited_with_priv_stack_tmp
?
}
}
- if (priv_stack_supported == NO_PRIV_STACK && subtree_depth > MAX_BPF_STACK) {
- verbose(env, "combined stack size of %d calls is %d. Too large\n",
- depth_frame, subtree_depth);
- return -EACCES;
+ if (priv_stack_supported == NO_PRIV_STACK) {
+ if (subtree_depth > MAX_BPF_STACK) {
+ verbose(env, "combined stack size of %d calls is %d. Too large\n",
+ depth_frame, subtree_depth);
+ return -EACCES;
+ }
+ if (orig_priv_stack_supported == PRIV_STACK_ADAPTIVE) {
+ for (int i = 0; i < env->subprog_cnt; i++)
+ si[i].use_priv_stack = false;
+ }
why? This patch suppose clear use_priv_stack from subprogs
that are dual called and only from those subprogs.
All other subprogs are fine.
But it seems the alog attempts to detect one such calling scenario
and disables priv_stack everywhere?
Sorry about this. Will fix in the next revision.