On Wed, 25 Mar 2020 17:16:13 -0700 Andrii Nakryiko wrote: > > >> Well, I wasn't talking about any of those subsystems, I was talking > > >> about networking :) > > > > > > So it's not "BPF subsystem's relation to the rest of the kernel" from > > > your previous email, it's now only "talking about networking"? Since > > > when the rest of the kernel is networking? > > > > Not really, I would likely argue the same for any other subsystem, I > > And you would like lose that argument :) You already agreed that for > tracing this is not the case. BPF is not attached by writing text into > ftrace's debugfs entries. Same for cgroups, we don't > create/update/write special files in cgroupfs, we have an explicit > attachment API in BPF. > > BTW, kprobes started out with the same model as XDP has right now. You > had to do a bunch of magic writes into various debugfs files to attach > BPF program. If user-space application crashed, kprobe stayed > attached. This was horrible and led to many problems in real world > production uses. So a completely different interface was created, > allowing to do it through perf_event_open() and created anonymous > inode for BPF program attachment. That allowed crashing program to > auto-detach kprobe and not harm production use case. > > Now we are coming after cgroup BPF programs, which have similar issues > and similar pains in production. cgroup BPF progs actually have extra > problems: programs can user-space applications can accidentally > replace a critical cgroup program and ruin the day for many folks that > have to deal with production breakage after that. Which is why I'm > implementing bpf_link with all its properties: to solve real pain and > real problem. > > Now for XDP. It has same flawed model. And even if it seems to you > that it's not a big issue, and even if Jakub thinks we are trying to > solve non-existing problem, it is a real problem and a real concern > from people that have to support XDP in production with many More than happy to talk to those folks, and see the tickets. Toke has actual user space code which needs his extension, and for which "ownership" makes no difference as it would just be passed with whoever touched the program last. > well-meaning developers developing BPF applications independently. There is one single program which can be attached to the XDP hook, the "everybody attaches their program model" does not apply. TW agent should just listen on netlink notifications to see if someone replaced its program. cgroups have multi-attachment and no notifications (although not sure anyone was explicitly asking for links there, either). In production a no-op XDP program is likely to be attached from the moment machine boots, to avoid traffic interruption and the risk of something going wrong with the driver when switching between skb to xdp datapath. And then the program is only replaced, not detached. Not to mention the fact that networking applications generally don't want to remove their policy from the kernel when they crash :/ > Now, those were fundamental things, but I'd like to touch on a "nice > things we get with that". Having a proper kernel object representing > single instance of attached BPF program to some other kernel object > allows to build an uniform and consistent API around bpf_link with > same semantics. We can do LINK_UPDATE and allow to atomically replace > BPF program inside the established bpf_link. It's applicable to all > types of BPF program attachment and can be done in a way that ensures > no BPF program invocation is skipped while BPF programs are swapped > (because at the lowest level it boils down to an atomic pointer swap). > Of course not all bpf_links might have this support initially, but > we'll establish a lot of common infrastructure which will make it > simpler, faster and more reliable to add this functionality. XDP replace is already atomic, no packet will be passed without either old or new program executed on it. > And to wrap up. I agree, consistent API is not a goal in itself, as > Jakub mentioned. But it is a worthy goal nevertheless, especially if > it doesn't cost anything extra. It makes kernel developers lives Not sure how having two interfaces instead of one makes kernel developer's life easier.