Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > Now for XDP. It has same flawed model. And even if it seems to you > that it's not a big issue, and even if Jakub thinks we are trying to > solve non-existing problem, it is a real problem and a real concern > from people that have to support XDP in production with many > well-meaning developers developing BPF applications independently. > Copying what you wrote in another thread: > >> Setting aside the question of which is the best abstraction to represent >> an attachment, it seems to me that the actual behavioural problem (XDP >> programs being overridden by mistake) would be solvable by this patch, >> assuming well-behaved userspace applications. > > ... this is a horrible and unrealistic assumption that we just cannot > make and accept. However well-behaved userspace applications are, they > are written by people that make mistakes. And rather than blissfully > expect that everything will be fine, we want to have enforcements in > place that will prevent some buggy application to wreck havoc in > production. Look, I'm not trying to tell you how to managed your internal systems. I'm just objecting to your assertion that your deployment model is the only one that can possibly work, and the refusal to consider other alternatives that comes with it. >> You're saying that like we didn't already have the netlink API. We >> essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY, >> this is just adding LINK_UPDATE. It's a straight-forward fix of an >> existing API; essentially you're saying we should keep the old API in a >> crippled state in order to promote your (proposed) new API. > > This is the fundamental disagreement that we seem to have. XDP's BPF > program attachment is not in any way equivalent to bpf_link. So no, > netlink API currently doesn't have anything that's close to bpf_link. > Let me try to summarize what bpf_link is and what are its fundamental > properties regardless of type of BPF programs. First of all, thank you for this summary; that is very useful! > 1. bpf_link represents a connection (pairing?) of BPF program and some > BPF hook it is attached to. BPF hook could be perf event, cgroup, > netdev, etc. It's a completely independent object in itself, along the > bpf_map and bpf_prog, which has its own lifetime and kernel > representation. To user-space application it is returned as an > installed FD, similar to loaded BPF program and BPF map. It is > important that it's not just a BPF program, because BPF program can be > attached to multiple BPF hooks (e.g., same XDP program can be attached > to multiple interface; same kprobe handler can be installed multiple > times), which means that having BPF program FD isn't enough to > uniquely represent that one specific BPF program attachment and detach > it or query it. Having kernel object for this allows to encapsulate > all these various details of what is attached were and present to > user-space a single handle (FD) to work with. For XDP there is already a unique handle, it's just implicit: Each netdev can have exactly one XDP program loaded. So I don't really see how bpf_link adds anything, other than another API for the same thing? > 2. Due to having FD associated with bpf_link, it's not possible to > talk about "owning" bpf_link. If application created link and never > shared its FD with any other application, it is the sole owner of it. > But it also means that you can share it, if you need it. Now, once > application closes FD or app crashes and kernel automatically closes > that FD, bpf_link refcount is decremented. If it was the last or only > FD, it will trigger automatica detachment and clean up of that > particular BPF program attachment. Note, not a clean up of BPF > program, which can still be attached somewhere else: only that > particular attachment. This behaviour is actually one of my reservations against bpf_link for XDP: I think that automatically detaching XDP programs when the FD is closed is very much the wrong behaviour. An XDP program processes packets, and when loading one I very much expect it to keep doing that until I explicitly tell it to stop. > 3. This derives from the concept of ownership of bpf_link. Once > bpf_link is attached, no other application that doesn't own that > bpf_link can replace, detach or modify the link. For some cases it > doesn't matter. E.g., for tracing, all attachment to the same fentry > trampoline are completely independent. But for other cases this is > crucial property. E.g., when you attach BPF program in an exclusive > (single) mode, it means that particular cgroup and any of its children > cgroups can have any more BPF programs attached. This is important for > container management systems to enforce invariants and correct > functioning of the system. Right now it's very easy to violate that - > you just go and attach your own BPF program, and previous BPF program > gets automatically detached without original application that put it > there knowing about this. Chaos ensues after that and real people have > to deal with this. Which is why existing > BPF_PROG_ATTACH/BPF_PROG_DETACH API is inadequate and we are adding > bpf_link support. I can totally see how having an option to enforce a policy such as locking out others from installing cgroup BPF programs is useful. But such an option is just that: policy. So building this policy in as a fundamental property of the API seems like a bad idea; that is effectively enforcing policy in the kernel, isn't it? > Those same folks have similar concern with XDP. In the world where > container management installs "root" XDP program which other user > applications can plug into (libxdp use case, right?), it's crucial to > ensure that this root XDP program is not accidentally overwritten by > some well-meaning, but not overly cautious developer experimenting in > his own container with XDP programs. This is where bpf_link ownership > plays a huge role. Tupperware agent (FB's container management agent) > would install root XDP program and will hold onto this bpf_link > without sharing it with other applications. That will guarantee that > the system will be stable and can't be compromised. See this is where we get into "deployment-model specific territory". I mean, sure, in the "central management daemon" model, it makes sense that no other applications can replace the XDP program. But, erm, we already have a mechanism to ensure that: Just don't grant those applications CAP_NET_ADMIN? So again, bpf_link doesn't really seem to add anything other than a different way to do the same thing? Additionally, in the case where there is *not* a central management daemon (i.e., what I'm implementing with libxdp), this would be the flow implemented by the library without bpf_link: 1. Query kernel for current BPF prog loaded on $IFACE 2. Sanity-check that this program is a dispatcher program installed by libxdp 3. Create a new dispatcher program with whatever changes we want to do (such as adding another component program). 4. Atomically replace the old program with the new one using the netlink API in this patch series. Whereas with bpf_link, it would be: 1. Find the pinned bpf_link for $IFACE (e.g., load from /sys/fs/bpf/iface-links/$IFNAME). 2. Query kernel for current BPF prog linked to $LINK 3. Sanity-check that this program is a dispatcher program installed by libxdp 4. Create a new dispatcher program with whatever changes we want to do (such as adding another component program). 5. Atomically replace the old program with the new one using the LINK_UPDATE bpf() API. So all this does is add an additional step, and another dependency on bpffs. And crucially, I really don't see how the "bpf_link is the only thing that is not fundamentally broken" argument holds up. -Toke