On Wed, Mar 25, 2020 at 11:20:05AM -0700, Jakub Kicinski wrote: > On Wed, 25 Mar 2020 11:06:38 -0700 Alexei Starovoitov wrote: > > On Tue, Mar 24, 2020 at 07:15:54PM -0700, Jakub Kicinski wrote: > > > It is the way to configure XDP today, so it's only natural to > > > scrutinize the attempts to replace it. > > > > No one is replacing it. > > You're blocking extensions to the existing API, that means that part > of the API is frozen and is being replaced. two things are wrong in the above stmt: 1. extensions are not frozen in general. 2. api is not being replaced. ownership is lacking. It needs to be added. It's a new concept. Not a replacement. > > > Also I personally don't think you'd see this much push back trying to > > > add bpf_link-based stuff to cls_bpf, that's an add-on. XDP is > > > integrated very fundamentally with the networking stack at this point. > > > > > > > Details are important and every case is different. So imo: > > > > converting ethtool to netlink - great stuff. > > > > converting netdev irq/queue management to netlink - great stuff too. > > > > adding more netlink api for xdp - really bad idea. > > > > > > Why is it a bad idea? > > > > I explained in three other emails. tldr: lack of ownership. > > Those came later, I think, thanks. > > Fine, maybe one day someone will find the extension you're proposing > useful. To me that's not a justification to freeze the existing API > (you said "adding more netlink api for xdp - really bad idea"). > > Besides, if you look at Toke's libxdp work (which exists), what's the > ownership of the attached program? Whichever application touched it > last? > > The whole auto-detachment thing may work nicely in cls_bpf and > sub-programs attached to the root XDP program, but it's a bit hard > to imagine how its useful for the singleton root XDP program. bpf_link introduces two new things: 1. ownership 2. auto-detach They are both useful. Looks like the use case for 2 is obvious, but 1 can exist without being FD based. > > > > There are plenty things which will only be available over netlink. > > > Configuring the interface so installing the XDP program is possible > > > (disabling features, configuring queues etc.). Chances are user gets > > > the ifindex of the interface to attach to over netlink in the first > > > place. The queue configuration (which you agree belongs in netlink) > > > will definitely get more complex to allow REDIRECTs to work more > > > smoothly. AF_XDP needs all sort of netlink stuff. > > > > sure. that has nothing to do with ownership of attachment. > > AFAICT the allure to John is the uniform API, and no need for netlink. > I was explaining how that's a bad goal to have. You clearly misunderstood. Neither John nor I were saying that there is no need for netlink. > > > > Netlink gives us the notification mechanism which is how we solve > > > coordination across daemons (something that BPF subsystem is only > > > now trying to solve). > > > > I don't care about notifications on attachment and no one is trying to > > solve that as far as I can see. It's not a problem to solve in the first place. > > Well, it's the existing solution to the "ownership" problem. > I think most people simply didn't know about it. Toke's set introduces the same thing to XDP as commit 7dd68b3279f1 ("bpf: Support replacing cgroup-bpf program in MULTI mode") did for cgroup-bpf. Both are trying to address the same issue and both are NOT doing. That cgroup-bpf commit looked like a great solution just three month ago. Now it's clear it's not fixing the underlying issue. Same thing with Toke's fix. It feels good now, but going to be uselss without introducing ownership. Why that cgroup-bpf commit not fixing it? Take a look at that commit. The first paragraph is " The common use-case in production is to have multiple cgroup-bpf programs per attach type that cover multiple use-cases. Such programs are attached with BPF_F_ALLOW_MULTI and can be maintained by different people. " Then the description goes into explaining how one service wants to replace its prog. In this case it sort of works because it's single c++ service with multiple progs that do different things. There is a 'centralized daemon' (kinda) that can try to orchestrate. It breaks when there are two c++ services. That replace_bpf_fd is trying to be a link identifier. But the kernel lacks that identifier. I think it would be simpler to understand the ownership if bpf_link had its own IDR for every link. Every attachment(link) would be an object with its own id. We could have iterated over all attachments with GET_NEXT_ID, for example. But that's nice to have. Not strictly necessary. The ownership of the attachment needs to be permanent. It needs to belong to a task and other tasks should not be able to break that attachment. That cgroup-bpf commit addressing part of the issue by "inventing" an identifier for the attachment (in the form of prog_fd that suppose to be there in that attachment), but not addressing the owner part of the attachment. Only the task(s) that own that attachment should be able to modify the attachment. One can imagine how attachment ID can be completely implemented with netlink. Is it good idea? Not really, because there is no mechanism to transfer the ownership. Having an FD that points to a kernel object that represents the ownership makes it easy for user space to pass the ownership (by passing an FD). Auto-detach part comes for free with FD based bpf_link, but that's not the main feature. May be we will add a flag to disable auto-detach too.