On 4/29/20 6:41 PM, Andrii Nakryiko wrote:
On Mon, Apr 27, 2020 at 1:17 PM Yonghong Song <yhs@xxxxxx> wrote:
Three new libbpf APIs are added to support bpf_iter:
- bpf_program__attach_iter
Given a bpf program and additional parameters, which is
none now, returns a bpf_link.
- bpf_link__create_iter
Given a bpf_link, create a bpf_iter and return a fd
so user can then do read() to get seq_file output data.
- bpf_iter_create
syscall level API to create a bpf iterator.
Two macros, BPF_SEQ_PRINTF0 and BPF_SEQ_PRINTF, are also introduced.
These two macros can help bpf program writers with
nicer bpf_seq_printf syntax similar to the kernel one.
Signed-off-by: Yonghong Song <yhs@xxxxxx>
---
tools/lib/bpf/bpf.c | 11 +++++++
tools/lib/bpf/bpf.h | 2 ++
tools/lib/bpf/bpf_tracing.h | 23 ++++++++++++++
tools/lib/bpf/libbpf.c | 60 +++++++++++++++++++++++++++++++++++++
tools/lib/bpf/libbpf.h | 11 +++++++
tools/lib/bpf/libbpf.map | 7 +++++
6 files changed, 114 insertions(+)
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 5cc1b0785d18..7ffd6c0ad95f 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -619,6 +619,17 @@ int bpf_link_update(int link_fd, int new_prog_fd,
return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr));
}
+int bpf_iter_create(int link_fd, unsigned int flags)
Do you envision anything more than just flags being passed for
bpf_iter_create? I wonder if we should just go ahead with options
struct here?
I think most, if not all, parameters should go to link create.
This way, we can have the identical anon_iter through:
link -> anon_iter
link -> pinned file -> anon_iter
I do not really expect any more fields for bpf_iter_create.
The flags here is for potential future extension, which I
have no idea how it looks like.
+{
+ union bpf_attr attr;
+
+ memset(&attr, 0, sizeof(attr));
+ attr.iter_create.link_fd = link_fd;
+ attr.iter_create.flags = flags;
+
+ return sys_bpf(BPF_ITER_CREATE, &attr, sizeof(attr));
+}
+
[...]
+/*
+ * BPF_SEQ_PRINTF to wrap bpf_seq_printf to-be-printed values
+ * in a structure. BPF_SEQ_PRINTF0 is a simple wrapper for
+ * bpf_seq_printf().
+ */
+#define BPF_SEQ_PRINTF0(seq, fmt) \
+ ({ \
+ int ret = bpf_seq_printf(seq, fmt, sizeof(fmt), \
+ (void *)0, 0); \
+ ret; \
+ })
+
+#define BPF_SEQ_PRINTF(seq, fmt, args...) \
You can unify BPF_SEQ_PRINTF and BPF_SEQ_PRINTF0 by using
___bpf_empty() macro. See bpf_core_read.h for similar use case.
Specifically, look at ___empty (equivalent of ___bpf_empty) and
___core_read, ___core_read0, ___core_readN macro.
Thanks for the tip. Will try.
+ ({ \
+ _Pragma("GCC diagnostic push") \
+ _Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \
+ __u64 param[___bpf_narg(args)] = { args }; \
Do you need to provide the size of array here? If you omit
__bpf_narg(args), wouldn't compiler automatically calculate the right
size?
Yes, compiler should calculate correct size.
Also, can you please use "unsigned long long" to not have any implicit
dependency on __u64 being defined?
Will do.
+ _Pragma("GCC diagnostic pop") \
+ int ret = bpf_seq_printf(seq, fmt, sizeof(fmt), \
+ param, sizeof(param)); \
+ ret; \
+ })
+
#endif
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8e1dc6980fac..ffdc4d8e0cc0 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6366,6 +6366,9 @@ static const struct bpf_sec_def section_defs[] = {
.is_attach_btf = true,
.expected_attach_type = BPF_LSM_MAC,
.attach_fn = attach_lsm),
+ SEC_DEF("iter/", TRACING,
+ .expected_attach_type = BPF_TRACE_ITER,
+ .is_attach_btf = true),
It would be nice to implement auto-attach capabilities (similar to
fentry/fexit, lsm and raw_tracepoint). Section name should have enough
information for this, no?
In the current form, yes, auto attach is possible.
But I am thinking we may soon have additional information
like map_id (appear in link_create) etc.
to make auto attach not possible. That is why
I implemented an explicit attach. is this assessment correct?
BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP),
BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT),
BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN),
@@ -6629,6 +6632,7 @@ static int bpf_object__collect_struct_ops_map_reloc(struct bpf_object *obj,
[...]
+
+ link = calloc(1, sizeof(*link));
+ if (!link)
+ return ERR_PTR(-ENOMEM);
+ link->detach = &bpf_link__detach_fd;
+
+ attach_type = bpf_program__get_expected_attach_type(prog);
Given you know it has to be BPF_TRACE_ITER, it's better to explicitly
specify that. If provided program wasn't loaded with correct
expected_attach_type, kernel will reject it. But if you don't do it,
then you can accidentally create some other type of bpf_link.
Yes, will do.
+ link_fd = bpf_link_create(prog_fd, 0, attach_type, NULL);
+ if (link_fd < 0) {
+ link_fd = -errno;
+ free(link);
+ pr_warn("program '%s': failed to attach to iterator: %s\n",
+ bpf_program__title(prog, false),
+ libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
+ return ERR_PTR(link_fd);
+ }
+ link->fd = link_fd;
+ return link;
+}
+
+int bpf_link__create_iter(struct bpf_link *link, unsigned int flags)
+{
Same question as for low-level bpf_link_create(). If we expect the
need to extend optional things in the future, I'd add opts right now.
But I wonder if bpf_link__create_iter() provides any additional value
beyond bpf_iter_create(). Maybe let's not add it (yet)?
The only additional thing is better warning messsage.
Agree that is so marginal. Will drop it.
+ char errmsg[STRERR_BUFSIZE];
+ int iter_fd;
+
+ iter_fd = bpf_iter_create(bpf_link__fd(link), flags);
+ if (iter_fd < 0) {
+ iter_fd = -errno;
+ pr_warn("failed to create an iterator: %s\n",
+ libbpf_strerror_r(iter_fd, errmsg, sizeof(errmsg)));
+ }
+
+ return iter_fd;
+}
+
[...]