On Tue, Dec 17, 2019 at 07:07:23PM -0800, Andrii Nakryiko wrote: > On Fri, Dec 13, 2019 at 4:48 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > > > This patch adds BPF STRUCT_OPS support to libbpf. > > > > The only sec_name convention is SEC("struct_ops") to identify the > > struct ops implemented in BPF, e.g. > > SEC("struct_ops") > > struct tcp_congestion_ops dctcp = { > > .init = (void *)dctcp_init, /* <-- a bpf_prog */ > > /* ... some more func prts ... */ > > .name = "bpf_dctcp", > > }; > > > > In the bpf_object__open phase, libbpf will look for the "struct_ops" > > elf section and find out what is the btf-type the "struct_ops" is > > implementing. Note that the btf-type here is referring to > > a type in the bpf_prog.o's btf. It will then collect (through SHT_REL) > > where are the bpf progs that the func ptrs are referring to. > > > > In the bpf_object__load phase, the prepare_struct_ops() will load > > the btf_vmlinux and obtain the corresponding kernel's btf-type. > > With the kernel's btf-type, it can then set the prog->type, > > prog->attach_btf_id and the prog->expected_attach_type. Thus, > > the prog's properties do not rely on its section name. > > > > Currently, the bpf_prog's btf-type ==> btf_vmlinux's btf-type matching > > process is as simple as: member-name match + btf-kind match + size match. > > If these matching conditions fail, libbpf will reject. > > The current targeting support is "struct tcp_congestion_ops" which > > most of its members are function pointers. > > The member ordering of the bpf_prog's btf-type can be different from > > the btf_vmlinux's btf-type. > > > > Once the prog's properties are all set, > > the libbpf will proceed to load all the progs. > > > > After that, register_struct_ops() will create a map, finalize the > > map-value by populating it with the prog-fd, and then register this > > "struct_ops" to the kernel by updating the map-value to the map. > > > > By default, libbpf does not unregister the struct_ops from the kernel > > during bpf_object__close(). It can be changed by setting the new > > "unreg_st_ops" in bpf_object_open_opts. > > > > Signed-off-by: Martin KaFai Lau <kafai@xxxxxx> > > --- > > This looks pretty good to me. The big two things is exposing structops > as real struct bpf_map, so that users can interact with it using > libbpf APIs, as well as splitting struct_ops map creation and > registration. bpf_object__load() should only make sure all maps are > created, progs are loaded/verified, but none of BPF program can yet be > called. Then attach is the phase where registration happens. Thanks for the review. [ ... ] > > static inline __u64 ptr_to_u64(const void *ptr) > > { > > return (__u64) (unsigned long) ptr; > > @@ -233,6 +239,32 @@ struct bpf_map { > > bool reused; > > }; > > > > +struct bpf_struct_ops { > > + const char *var_name; > > + const char *tname; > > + const struct btf_type *type; > > + struct bpf_program **progs; > > + __u32 *kern_func_off; > > + /* e.g. struct tcp_congestion_ops in bpf_prog's btf format */ > > + void *data; > > + /* e.g. struct __bpf_tcp_congestion_ops in btf_vmlinux's btf > > Using __bpf_ prefix for this struct_ops-specific types is a bit too > generic (e.g., for raw_tp stuff Alexei used btf_trace_). So maybe make > it btf_ops_ or btf_structops_? Is it a concern on name collision? The prefix pick is to use a more representative name. struct_ops use many bpf pieces and btf is one of them. Very soon, all new codes will depend on BTF and btf_ prefix could become generic also. Unlike tracepoint, there is no non-btf version of struct_ops. > > > > + * format. > > + * struct __bpf_tcp_congestion_ops { > > + * [... some other kernel fields ...] > > + * struct tcp_congestion_ops data; > > + * } > > + * kern_vdata in the sizeof(struct __bpf_tcp_congestion_ops). > > Comment isn't very clear.. do you mean that data pointed to by > kern_vdata is of sizeof(...) bytes? > > > + * prepare_struct_ops() will populate the "data" into > > + * "kern_vdata". > > + */ > > + void *kern_vdata; > > + __u32 type_id; > > + __u32 kern_vtype_id; > > + __u32 kern_vtype_size; > > + int fd; > > + bool unreg; > > This unreg flag (and default behavior to not unregister) is bothering > me a bit.. Shouldn't this be controlled by map's lifetime, at least. > E.g., if no one pins that map - then struct_ops should be unregistered > on map destruction. If application wants to keep BPF programs > attached, it should make sure to pin map, before userspace part exits? > Is this problematic in any way? I don't think it should in the struct_ops case. I think of the struct_ops map is a set of progs "attach" to a subsystem (tcp_cong in this case) and this map-progs stay (or keep attaching) until it is detached. Like other attached bpf_prog keeps running without caring if the bpf_prog is pinned or not. About the "bool unreg;", the default can be changed to true if it makes more sense. [ ... ] > > > + > > + kern_data = st_ops->kern_vdata + st_ops->kern_func_off[i]; > > + *(unsigned long *)kern_data = prog_fd; > > + } > > + > > + map_attr.map_type = BPF_MAP_TYPE_STRUCT_OPS; > > + map_attr.key_size = sizeof(unsigned int); > > + map_attr.value_size = st_ops->kern_vtype_size; > > + map_attr.max_entries = 1; > > + map_attr.btf_fd = btf__fd(obj->btf); > > + map_attr.btf_vmlinux_value_type_id = st_ops->kern_vtype_id; > > + map_attr.name = st_ops->var_name; > > + > > + fd = bpf_create_map_xattr(&map_attr); > > we should try to reuse bpf_object__init_internal_map(). This will add > struct bpf_map which users can iterate over and look up by name, etc. > We had similar discussion when Daniel was adding global data maps, > and we conclusively decided that these special maps have to be > represented in libbpf as struct bpf_map as well. I will take a look. > > > + if (fd < 0) { > > + err = -errno; > > + pr_warn("struct_ops register %s: Error in creating struct_ops map\n", > > + tname); > > + return err; > > + } > > + > > + err = bpf_map_update_elem(fd, &zero, st_ops->kern_vdata, 0);