Re: [PATCH bpf-next 1/3] bpf: Add bpf_for_each helper

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/18/21 7:23 PM, Joanne Koong wrote:

On 11/18/21 12:14 PM, Andrii Nakryiko wrote:

On Wed, Nov 17, 2021 at 5:06 PM Joanne Koong<joannekoong@xxxxxx>  wrote:
This patch adds the kernel-side and API changes for a new helper
function, bpf_for_each:

long bpf_for_each(u32 nr_interations, void *callback_fn,
void *callback_ctx, u64 flags);
foreach in other languages are usually used when you are iterating
elements of some data structure or stream of data, so the naming feels
slightly off. bpf_loop() for bpf_for_range() seems to be more directly
pointing to what's going on. My 2 cents, it's subjective, of course.

Ooh, I really like "bpf_loop()"! I will change it to this for v2.
bpf_for_each invokes the "callback_fn" nr_iterations number of times
or until the callback_fn returns 1.
As Toke mentioned, we don't really check 1. Enforcing it on verifier
side is just going to cause more troubles for users and doesn't seem
important. I can see two ways to define the semantics, with error
propagation and without.

For error propagation, we can define:
   - >0, break and return number of iterations performed in total;
   - 0, continue to next iteration, if no more iterations, return
number of iterations performed;
   - <0, break and return that error value (but no way to know at which
iteration this happened, except through custom context);

Or we can make it simpler and just:
   - 0, continue;
   - != 0, break;
   - always return number of iterations performed.


No strong preferences on my side, I see benefits to both ways.
This is already enforced in the verifier (as Yonghong mentioned as well)
in prepare_func_exit() -
          if (callee->in_callback_fn) {
                  /* enforce R0 return value range [0, 1]. */
                  struct tnum range = tnum_range(0, 1);

                  if (r0->type != SCALAR_VALUE) {
                          verbose(env, "R0 not a scalar value\n");
                          return -EACCES;
                  }
                  if (!tnum_in(range, r0->var_off)) {
                          verbose_invalid_scalar(env, r0, &range,
"callback return", "R0");
                          return -EINVAL;
                  }
          }
The verifier enforces that at callback return, the R0 register is always 0 or 1

A few things to please note:
~ The "u64 flags" parameter is currently unused but is included in
case a future use case for it arises.
~ In the kernel-side implementation of bpf_for_each (kernel/bpf/bpf_iter.c),
bpf_callback_t is used as the callback function cast.
~ A program can have nested bpf_for_each calls but the program must
still adhere to the verifier constraint of its stack depth (the stack depth
cannot exceed MAX_BPF_STACK))
~ The next patch will include the tests and benchmark

Signed-off-by: Joanne Koong<joannekoong@xxxxxx>
---
  include/linux/bpf.h            |  1 +
  include/uapi/linux/bpf.h       | 23 +++++++++++++++++++++++
  kernel/bpf/bpf_iter.c          | 32 ++++++++++++++++++++++++++++++++
  kernel/bpf/helpers.c           |  2 ++
  kernel/bpf/verifier.c          | 28 ++++++++++++++++++++++++++++
  tools/include/uapi/linux/bpf.h | 23 +++++++++++++++++++++++
  6 files changed, 109 insertions(+)

[...]
diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index b2ee45064e06..cb742c50898a 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -714,3 +714,35 @@ const struct bpf_func_proto bpf_for_each_map_elem_proto = {
         .arg3_type      = ARG_PTR_TO_STACK_OR_NULL,
         .arg4_type      = ARG_ANYTHING,
  };
+
+BPF_CALL_4(bpf_for_each, u32, nr_iterations, void *, callback_fn, void *, callback_ctx,
+          u64, flags)
+{
+       bpf_callback_t callback = (bpf_callback_t)callback_fn;
+       u64 err;
+       u32 i;
+
I wonder if we should have some high but reasonable number of
iteration limits. It would be too easy for users to cause some
overflow and not notice it, and then pass 4bln iterations and freeze
the kernel. I think limiting to something like 1mln or 8mln might be
ok. Thoughts?

I see the pros and cons of both. It doesn't seem that likely to me that users would accidentally pass in a negative u32 value. At the same time, I don't think limiting it to something like 1 or 8 million is unreasonable (though it might require
the user to read the documentation more closely :)) - if the user wants to
do more than 8 million loops then they can call the loop helper multiple times

As a user, I think I'd prefer u32 where I'd automatically know the limit is 2^32 - 1,
but either approach sounds good to me!

[...]
  static int set_timer_callback_state(struct bpf_verifier_env *env,
                                     struct bpf_func_state *caller,
                                     struct bpf_func_state *callee,
@@ -6482,6 +6503,13 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
                         return -EINVAL;
         }

+       if (func_id == BPF_FUNC_for_each) {
+               err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
+                                       set_for_each_callback_state);
+               if (err < 0)
+                       return -EINVAL;
+       }
+
we should convert these ifs (they are not even if elses!) into a
switch. And move if (err < 0) return err; outside. It will only keep
growing.
Sounds great, I will do this in v2!
[...]
(resending this email to the vger mailing list because it didn't go through the first time)



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux