Toke Høiland-Jørgensen wrote: > Björn Töpel <bjorn.topel@xxxxxxxxx> writes: > > > On 2021-02-15 21:49, John Fastabend wrote: > >> Maciej Fijalkowski wrote: > >>> Currently, if there are multiple xdpsock instances running on a single > >>> interface and in case one of the instances is terminated, the rest of > >>> them are left in an inoperable state due to the fact of unloaded XDP > >>> prog from interface. > >>> > >>> To address that, step away from setting bpf prog in favour of bpf_link. > >>> This means that refcounting of BPF resources will be done automatically > >>> by bpf_link itself. > >>> > >>> When setting up BPF resources during xsk socket creation, check whether > >>> bpf_link for a given ifindex already exists via set of calls to > >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd > >>> and comparing the ifindexes from bpf_link and xsk socket. > >>> > >>> If there's no bpf_link yet, create one for a given XDP prog and unload > >>> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set. > >>> > >>> If bpf_link is already at a given ifindex and underlying program is not > >>> AF-XDP one, bail out or update the bpf_link's prog given the presence of > >>> XDP_FLAGS_UPDATE_IF_NOEXIST. > >>> > >>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> > >>> --- [...] > >>> - ctx->prog_fd = prog_fd; > >>> + link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts); > >>> + if (link_fd < 0) { > >>> + pr_warn("bpf_link_create failed: %s\n", strerror(errno)); > >>> + return link_fd; > >>> + } > >>> + > >> > >> This can leave the system in a bad state where it unloaded the XDP program > >> above, but then failed to create the link. So we should somehow fix that > >> if possible or at minimum put a note somewhere so users can't claim they > >> shouldn't know this. > >> > >> Also related, its not good for real systems to let XDP program go missing > >> for some period of time. I didn't check but we should make > >> XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already. > >> > > > > This is the default for XDP sockets library. The > > "bpf_set_link_xdp_fd(...-1)" way is only when a user sets it explicitly. > > One could maybe argue that the "force remove" would be out of scope for > > AF_XDP; Meaning that if an XDP program is running, attached via netlink, > > the AF_XDP library simply cannot remove it. The user would need to rely > > on some other mechanism. > > Yeah, I'd tend to agree with that. In general, I think the proliferation > of "just force-remove (or override) the running program" in code and > instructions has been a mistake; and application should only really be > adding and removing its own program... > > -Toke > I'll try to consolidate some of my opinions from a couple threads here. It looks to me many of these issues are self-inflicted by the API. We built the API with out the right abstractions or at least different abstractions from the rest of the BPF APIs. Not too surprising seeing the kernel side and user side were all being built up at once. For example this specific issue where the xsk API also deletes the XDP program seems to be due to merging the xsk with the creation of the XDP programs. I'm not a real user of AF_XDP (yet.), but here is how I would expect it to work based on how the sockmap pieces work, which are somewhat similar given they also deal with sockets. Program (1) load and pin an XDP BPF program - obj = bpf_object__open(prog); - bpf_object__load_xattr(&attr); - bpf_program__pin() (2) pin the map, find map_xsk using any of the map APIs - bpf_map__pin(map_xsk, path_to_pin) (3) attach to XDP - link = bpf_program__attach_xdp() - bpf_link__pin() At this point you have a BPF program loaded, a xsk map, and a link all pinned and ready. And we can add socks using the process found in `enter_xsks_into_map` in the sample. This can be the same program that loaded/pinned the XDP program or some other program it doesn't really matter. - create xsk fd . xsk_umem__create() . xsk_socket__create - open map @ pinned path - bpf_map_update_elem(xsks_map, &key, &fd, 0); Then it looks like we don't have any conflicts? The XDP program is pinned and exists in its normal scope. The xsk elements can be added/deleted as normal. If the XDP program is removed and the map referencing (using normal ref rules) reaches zero its also deleted. Above is more or less the same flow we use for any BPF program so looks good to me. The trouble seems to pop up when using the higher abstraction APIs xsk_setup_xdp_prog and friends I guess? I just see above as already fairly easy to use and we have good helpers to create the sockets it looks like. Maybe I missed some design considerations. IMO higher level abstractions should go in new libxdp and above should stay in libbpf. /rant off ;) Thanks, John