Re: [PATCH bpf-next v2 02/20] bpf: allow loading of a bpf_iter program

Yonghong Song <yhs@xxxxxx> · Tue, 5 May 2020 17:07:18 -0700

On 5/5/20 2:29 PM, Andrii Nakryiko wrote:
On Sun, May 3, 2020 at 11:26 PM Yonghong Song <yhs@xxxxxx> wrote:

A bpf_iter program is a tracing program with attach type
BPF_TRACE_ITER. The load attribute
   attach_btf_id
is used by the verifier against a particular kernel function,
which represents a target, e.g., __bpf_iter__bpf_map
for target bpf_map which is implemented later.

The program return value must be 0 or 1 for now.
   0 : successful, except potential seq_file buffer overflow
       which is handled by seq_file reader.
   1 : request to restart the same object

This bit is interesting. Is the idea that if BPF program also wants to
send something over, say, perf_buffer, but fails, it can "request"
same execution again? I wonder if typical libc fread() implementation

Yes. The bpf_seq_read() can handle this the same as any other
retry request. The following is current mapping.
   bpf program return 0   ---> seq_ops->show() return 0
   bpf program return 1   ---> seq_ops->show() return -EAGAIN

would handle EAGAIN properly, it seems more driven towards
non-blocking I/O?

I did not have a test for this in current patch set for bpf program
returning 1. Will add a test in the next version.


On the other hand, following start/show/next logic for seq_file
iteration, requesting skipping element seems useful. It would allow
(in some cases) to "speculatively" generate output and at some point
realize that this is not an element we actually want in the output and
request to ignore that output.

Don't know how useful the latter is going to be in practice, but just
something to keep in mind for the future, I guess...


In the future, other return values may be used for filtering or
teminating the iterator.

Signed-off-by: Yonghong Song <yhs@xxxxxx>
---
  include/linux/bpf.h            |  3 +++
  include/uapi/linux/bpf.h       |  1 +
  kernel/bpf/bpf_iter.c          | 30 ++++++++++++++++++++++++++++++
  kernel/bpf/verifier.c          | 21 +++++++++++++++++++++
  tools/include/uapi/linux/bpf.h |  1 +
  5 files changed, 56 insertions(+)


[...]


+
+bool bpf_iter_prog_supported(struct bpf_prog *prog)
+{
+       const char *attach_fname = prog->aux->attach_func_name;
+       u32 prog_btf_id = prog->aux->attach_btf_id;
+       const char *prefix = BPF_ITER_FUNC_PREFIX;
+       struct bpf_iter_target_info *tinfo;
+       int prefix_len = strlen(prefix);
+       bool supported = false;
+
+       if (strncmp(attach_fname, prefix, prefix_len))
+               return false;
+
+       mutex_lock(&targets_mutex);
+       list_for_each_entry(tinfo, &targets, list) {
+               if (tinfo->btf_id && tinfo->btf_id == prog_btf_id) {
+                       supported = true;
+                       break;
+               }
+               if (!strcmp(attach_fname + prefix_len, tinfo->target)) {
+                       tinfo->btf_id = prog->aux->attach_btf_id;

This target_info->btf_id caching here is a bit subtle and easy to
miss, it would be nice to have a code calling this out explicitly.

Will do.

Thanks!

+                       supported = true;
+                       break;
+               }
+       }
+       mutex_unlock(&targets_mutex);
+
+       return supported;
+}

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 70ad009577f8..d725ff7d11db 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7101,6 +7101,10 @@ static int check_return_code(struct bpf_verifier_env *env)
                         return 0;
                 range = tnum_const(0);
                 break;
+       case BPF_PROG_TYPE_TRACING:
+               if (env->prog->expected_attach_type != BPF_TRACE_ITER)
+                       return 0;

Commit message mentions enforcing [0, 1], shouldn't it be done here?

The default range is [0, 1], hence no explicit assignment here.

static int check_return_code(struct bpf_verifier_env *env)
{
        struct tnum enforce_attach_type_range = tnum_unknown;
        const struct bpf_prog *prog = env->prog;
        struct bpf_reg_state *reg;
        struct tnum range = tnum_range(0, 1);
......



+               break;
         default:
                 return 0;
         }

[...]