This reply was resent as the previous email had a missing In-Reply-To in the header. > > > On Mon, 2025-02-10 at 16:06 -0800, Andrii Nakryiko wrote: > > > > Tracking associated maps for a program is not necessary. As long as > > > > the last BPF program using the BPF map is unloaded, the kernel will > > > > automatically free not-anymore-referenced BPF map. Note that > > > > bpf_object itself will keep FDs for BPF maps, so you'd need to make > > > > sure to do bpf_object__close() to release those references. > > > > > > > > But if you are going to ask to re-create BPF maps next time BPF > > > > program is loaded... Well, I'll say you are asking for a bit too > > > > > much, > > > > tbh. If you want to be *that* sophisticated, it shouldn't be too > > > > hard > > > > for you to get all this information from BPF program's > > > > instructions. > > > > > > > > We really are that sophisticated (see below for more details). We could > > scan program instructions, but we'd then tie our logic to BPF > > implementation details and duplicate logic already present in libbpf > > implementation details and duplicate logic already present in libbpf > > (https://elixir.bootlin.com/linux/v6.13.2/source/tools/lib/bpf/libbpf.c#L= 6087 > > ). Obviously this *can* be done but it's not at all ideal from an > > application perspective. > > > > > I agree it's not ideal, but it's also not some complicated and > bound-to-be-changed logic. What you point out in libbpf source code is > a bit different thing, reality is much simpler. Only so-called ldimm64 > instruction (BPF_LD | BPF_IMM | BPF_DW opcode) can be referencing map > FD, so analysing this is borderline trivial. And this is part of BPF > ISA, so not going to change. Our approach is to associate an array of maps as a property with each BPF program, this property is initialised at the relocation stage. So, we do not need to parse BPF program instructions. Instead, we rely on recorded relocations. I think this is a more robust and clean solution with advantage of all code in the same place and being at the higher level of abstraction with a relocation table. The mainline libbpf keeps array of maps for a bpf_object, we extended this by adding an array of maps associated with each bpf_program. For example, a code excerpt, from our development branch, which associates a map with bpf_program at relocation phase: insn[0].src_reg = BPF_PSEUDO_MAP_FD; insn[0].imm = map->fd; err = bpf_program__add_map(prog, map); > > > > > > > > > > bpf_object is the unit of coherence in libbpf, so I don't see us > > > > refcounting maps between bpf_objects. Kernel is doing refcounting > > > > based on FDs, so see if you can use that. > > > > > > > > I can understand that. That said, I think if there's no logic across > > objects, and bpf_object access is not thread-safe, it puts us into a > > tough situation: > > - Complex refcounting, code scanning, etc to keep consistency when > > manipulating maps used by multiple programs. > > - Parallel loading not being well-balanced, if we split programs across > > objects. > > > > We could alternatively write our own custom loader, but then we’d have > > to duplicate much of the useful logic that libbpf already implements: > > skeleton generation, map/program association, embedding programs into > > ELFs, loading logic and kernel probing, etc. We’d like some way to > > handle dynamic/parallel loading without having to replicate all the > > advantages libbpf grants us. > > > > > Yeah, I can understand that as well, but bpf_object's single-threaded > design and the fact that bpf_object__load is kind of the final step > where programs are loaded (or not) is pretty backed in. I don't see > bpf_object becoming multi-threaded. We understood this, but the current bpf_object design allowed us to use it in a multithreaded environment with minor modification for bpf_program load. We understand that the design choice of libbpf being single threaded is unlikely to be reconsidered. > > > > > > > > > > bpf_object is the unit of coherence in libbpf, so I don't see us > > > > refcounting maps between bpf_objects. Kernel is doing refcounting > > > > based on FDs, so see if you can use that. > > > > > > > > I can understand that. That said, I think if there's no logic across > > objects, and bpf_object access is not thread-safe, it puts us into a > > tough situation: > > - Complex refcounting, code scanning, etc to keep consistency when > > manipulating maps used by multiple programs. > > - Parallel loading not being well-balanced, if we split programs across > > objects. > > > > We could alternatively write our own custom loader, but then we’d have > > to duplicate much of the useful logic that libbpf already implements: > > skeleton generation, map/program association, embedding programs into > > ELFs, loading logic and kernel probing, etc. We’d like some way to > > handle dynamic/parallel loading without having to replicate all the > > advantages libbpf grants us. > > > > > Yeah, I can understand that as well, but bpf_object's single-threaded > design and the fact that bpf_object__load is kind of the final step > where programs are loaded (or not) is pretty backed in. I don't see > bpf_object becoming multi-threaded. The dynamic program > loading/unloading/loading again is something that I can't yet justify, > tbh. > > > So the best I can propose you is to use libbpf's skeleton and > bpf_object concept for, effectively, ELF handling, relocations, all > the preparations up to loading BPF programs. And after that you can > take over loading and handling program lifetime outside of bpf_object. > > > Dynamic map creation after bpf_object__load() I think is completely > outside of the scope and you'll have to solve this problem for > yourself. I would point out, though, that internally libbpf already > switched to sort-of pre-creating stable FDs for maps before they are > actually created in the kernel. So it's conceivable that we can have > more granularity in bpf_object preparation. I.e., first step would be > to parse ELF and handle relocations, prepare everything. After that we > can have a step to create maps, and then another one to create > programs. Usually people would do all that, but you can stop right > before maps creation or before program creation, whatever fits your > use case better. > > > The key is that program instructions will be final and won't need > adjustments regardless of maps actually being created or not. FDs, as > I mentioned, are stable regardless. We used this in our design, so we did not need to scan BPF program instructions to fix map's fds referenced by instructions from a dynamically loaded bpf_program with dynamically created maps. > > > > The use case here is that our security monitoring agent leverages eBPF > > as its foundational technology to gather telemetry from the kernel. As > > part of that, we hook many different kernel subsystems (process, > > memory, filesystem, network, etc), tying them together and tracking > > with maps. So we legitimately have a very large number of programs all > > doing different work. For products of this scale, it increases security > > and performance to load this set of programs and their maps in an > > optimized, parallel fashion and subsequently change the loaded set of > > programs and maps dynamically without disturbing the rest of the > > application. > > > Yes, makes sense. You'll need to decide for yourself if it's actually > more meaningful to split those 200 programs into independent > bpf_objects by features, and be rigorous about sharing state (maps) > through bpf_map__reuse_fd(), which would allow to parallelize loading > within confines of existing libbpf APIs. Or you can be a bit more > low-level with program loading outside of bpf_object API, as I > described above. Yes, this can be one of the ways to share bpf maps across multiple bpf_objects and use existing libbpf for parallel bps programs loading, if we want to keep a full libbpf compatibility, but at a cost of complicating design, as we need to convert a single bpf_object model to multiple bpf_objects with a new layer that manages these bpf_objects. In our case, as a bpf_program can map to multiple features, which can be modified independently, and to achieve an even load balancing across multiple threads, it would be probably one bpf_program for a bpf_object.