Re: [PATCH bpf-next v9 04/10] bpf: Check potential private stack recursion for progs with async callback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/4/24 6:51 PM, Alexei Starovoitov wrote:
On Mon, Nov 4, 2024 at 11:38 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
In previous patch, tracing progs are enabled for private stack since
recursion checking ensures there exists no nested same bpf prog run on
the same cpu.

But it is still possible for nested bpf subprog run on the same cpu
if the same subprog is called in both main prog and async callback,
or in different async callbacks. For example,
   main_prog
    bpf_timer_set_callback(timer, timer_cb);
    call sub1
   sub1
    ...
   time_cb
    call sub1

In the above case, nested subprog run for sub1 is possible with one in
process context and the other in softirq context. If this is the case,
the verifier will disable private stack for this bpf prog.

Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
---
  include/linux/bpf_verifier.h |  2 ++
  kernel/bpf/verifier.c        | 42 +++++++++++++++++++++++++++++++-----
  2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 0622c11a7e19..e921589abc72 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -669,6 +669,8 @@ struct bpf_subprog_info {
         /* true if bpf_fastcall stack region is used by functions that can't be inlined */
         bool keep_fastcall_stack: 1;
         bool use_priv_stack: 1;
+       bool visited_with_priv_stack_accum: 1;
+       bool visited_with_priv_stack: 1;

         u8 arg_cnt;
         struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 406195c433ea..e01b3f0fd314 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6118,8 +6118,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
                                         idx, subprog_depth);
                                 return -EACCES;
                         }
-                       if (subprog_depth >= BPF_PRIV_STACK_MIN_SIZE)
+                       if (subprog_depth >= BPF_PRIV_STACK_MIN_SIZE) {
                                 subprog[idx].use_priv_stack = true;
+                               subprog[idx].visited_with_priv_stack = true;
+                       }
+               } else {
+                       subprog[idx].visited_with_priv_stack = true;
See suggestion for patch 3.
It's cleaner to rewrite with a single visited_with_priv_stack = true; statement.

Ack.


                 }
         }
  continue_func:
@@ -6220,10 +6224,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
  static int check_max_stack_depth(struct bpf_verifier_env *env)
  {
         struct bpf_subprog_info *si = env->subprog_info;
+       enum priv_stack_mode orig_priv_stack_supported;
         enum priv_stack_mode priv_stack_supported;
         int ret, subtree_depth = 0, depth_frame;

         priv_stack_supported = bpf_enable_priv_stack(env->prog);
+       orig_priv_stack_supported = priv_stack_supported;

         if (priv_stack_supported != NO_PRIV_STACK) {
                 for (int i = 0; i < env->subprog_cnt; i++) {
@@ -6240,13 +6246,39 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
                                                             priv_stack_supported);
                         if (ret < 0)
                                 return ret;
+
+                       if (priv_stack_supported != NO_PRIV_STACK) {
+                               for (int j = 0; j < env->subprog_cnt; j++) {
+                                       if (si[j].visited_with_priv_stack_accum &&
+                                           si[j].visited_with_priv_stack) {
+                                               /* si[j] is visited by both main/async subprog
+                                                * and another async subprog.
+                                                */
+                                               priv_stack_supported = NO_PRIV_STACK;
+                                               break;
+                                       }
+                                       if (!si[j].visited_with_priv_stack_accum)
+                                               si[j].visited_with_priv_stack_accum =
+                                                       si[j].visited_with_priv_stack;
+                               }
+                       }
+                       if (priv_stack_supported != NO_PRIV_STACK) {
+                               for (int j = 0; j < env->subprog_cnt; j++)
+                                       si[j].visited_with_priv_stack = false;
+                       }
I cannot understand what this algorithm is doing.
What is the meaning of visited_with_priv_stack_accum ?

The following is an example to show how the algorithm works.
Let us say we have prog like
   main_prog0  si[0]
     sub1      si[1]
     sub2      si[2]
   async1      si[3]
     sub4      si[4]
     sub2      si[2]
   async2      si[5]
     sub4      si[4]
     sub5      si[6]
Total 9 subprograms.

after iteration 1 (main_prog0)
   visited_with_priv_stack_accum: si[i] = false for i = 0 ... 9
   visited_with_priv_stack: si[0] = si[1] = si[2] = true, others false

   for all i, visited_with_priv_stack_accum[i] and visited_with_priv_stack[i]
   is false, so main_prog0 can use priv stack.

   visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
   visited_with_priv_stack cleared with false.

after iteration 2 (async1)
   visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
   visited_with_priv_stack: si[2] = si[3] = si[4] = true, others false

   Here, si[2] appears in both visited_with_priv_stack_accum and
   visited_with_priv_stack, so async1 cannot have priv stack.

   In my algorithm, I flipped the whole thing to no_priv_stack, which is
   too conservative. We should just skip async1 and continues.

   Let us say, we say async1 not having priv stack while main_prog0 has.

   /* the same as end of iteration 1 */
   visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
   visited_with_priv_stack cleared with false.

after iteration 3 (async2)
   visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
   visited_with_priv_stack: si[4] = si[5] = si[6] = true;

   there are no conflict, so async2 can use private stack.


If we only have one bit in bpf_subprog_info, for a async tree,
if marking a subprog to be true and later we found there is a conflict in
async tree and we need make the whole async subprogs not eligible for priv stack,
then it will be hard to undo previous markings.

So visited_with_priv_stack_accum is to accumulate "true" results from
main_prog/async's.

Maybe we change two bit names to
  visited_with_priv_stack
  visited_with_priv_stack_tmp
?


                 }
         }

-       if (priv_stack_supported == NO_PRIV_STACK && subtree_depth > MAX_BPF_STACK) {
-               verbose(env, "combined stack size of %d calls is %d. Too large\n",
-                       depth_frame, subtree_depth);
-               return -EACCES;
+       if (priv_stack_supported == NO_PRIV_STACK) {
+               if (subtree_depth > MAX_BPF_STACK) {
+                       verbose(env, "combined stack size of %d calls is %d. Too large\n",
+                               depth_frame, subtree_depth);
+                       return -EACCES;
+               }
+               if (orig_priv_stack_supported == PRIV_STACK_ADAPTIVE) {
+                       for (int i = 0; i < env->subprog_cnt; i++)
+                               si[i].use_priv_stack = false;
+               }
why? This patch suppose clear use_priv_stack from subprogs
that are dual called and only from those subprogs.
All other subprogs are fine.

But it seems the alog attempts to detect one such calling scenario
and disables priv_stack everywhere?

Sorry about this. Will fix in the next revision.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux