On Wed, Mar 04, 2020 at 01:24:39PM -0800, Jakub Kicinski wrote: > On Wed, 4 Mar 2020 12:45:07 -0800 Alexei Starovoitov wrote: > > On Wed, Mar 04, 2020 at 11:41:58AM -0800, Jakub Kicinski wrote: > > > On Tue, 3 Mar 2020 20:36:45 -0800 Alexei Starovoitov wrote: > > > > > > libxdp can choose to pin it in some libxdp specific location, so other > > > > > > libxdp-enabled applications can find it in the same location, detach, > > > > > > replace, modify, but random app that wants to hack an xdp prog won't > > > > > > be able to mess with it. > > > > > > > > > > What if that "random app" comes first, and keeps holding on to the link > > > > > fd? Then the admin essentially has to start killing processes until they > > > > > find the one that has the device locked, no? > > > > > > > > Of course not. We have to provide an api to make it easy to discover > > > > what process holds that link and where it's pinned. > > > > > > That API to discover ownership would be useful but it's on the BPF side. > > > > it's on bpf side because it's bpf specific. > > > > > We have netlink notifications in networking world. The application > > > which doesn't want its program replaced should simply listen to the > > > netlink notifications and act if something goes wrong. > > > > instead of locking the bike let's setup a camera and monitor the bike > > when somebody steals it. > > and then what? chase the thief and bring the bike back? > > :) Is the bike the BPF program? It's more like thief is stealing our > parking spot, we still have the program :) yeah. parking spot is a better analogy. > Maybe also the thief should not have CAP_ADMIN in the first place? > And ask a daemon to perform its actions.. a daemon idea keeps coming back in circles. With FD-based kprobe/uprobe/tracepoint/fexit/fentry that problem is gone, but xdp, tc, cgroup still don't have the owner concept. Some people argued that these three need three separate daemons. Especially since cgroups are mainly managed by systemd plus container manager it's quite different from networking (xdp, tc) where something like 'networkd' might makes sense. But if you take this line of thought all the ways systemd should be that single daemon to coordinate attaching to xdp, tc, cgroup because in many cases cgroup and tc progs have to coordinate the work. At that's where it's getting gloomy... unless the kernel can provide a facility so central daemon is not necessary. > > current xdp, tc, cgroup apis don't have the concept of the link > > and owner of that link. > > Why do the attachment points have to have a concept of an owner and > not the program itself? bpf program is an object. That object has an owner or multiple owners. A user process that holds a pointer to that object is a shared owner. FD is such pointer. FD == std::shared_ptr<bpf_prog>. Holding that pointer guarantees that <bpf_prog> will not disappear, but it says nothing that the program will keep running. For [ku]probe,tp,fentry,fexit there was always <bpf_link> in the kernel. It wasn't that formal in the past until most recent Andrii's patches, but the concept existed for long time. FD == std::shared_ptr<bpf_link> connects a kernel object with <bpf_prog>. When that kernel objects emits an event the <bpf_link> guarantees that <bpf_prog> will be executed. For cgroups we don't have such concept. We thought that three attach modes we introduced (default, allow-override, allow-multi) will cover all use cases. But in practice turned out that it only works when there is a central daemon for _all_ cgroup-bpf progs in the system otherwise different processes step on each other. More so there has to be a central diff-review human authority otherwise teams step on each other. That's sort-of works within one org, but doesn't scale. To avoid making systemd a central place to coordinate attaching xdp, tc, cgroup progs the kernel has to provide a mechanism for an application to connect a kernel object with a prog and hold the ownership of that link so that no other process in the system can break that connection. That kernel object is cgroup, qdisc, netdev. Interesting question comes when that object disappears. What to do with the link? Two ways to solve it: 1. make link hold the object, so it cannot be removed. 2. destroy the link when object goes away. Both have pros and cons as I mentioned earlier. And that's what's to be decided. I think the truth is somewhat in the middle. The link has to hold the object, so it doesn't disappear from under it, but get notified on deletion, so the link can be self destroyed. From the user point of view the execution guarantee is still preserved. The kernel object was removed and the link has one dangling side. Note this behavior is vastly different from existing xdp, tc, cgroup behavior where both object and bpf prog can be alive, but connection is gone and execution guarantee is broken.