David Vernet <void@xxxxxxxxxxxxx> writes: > On Wed, Feb 01, 2023 at 06:44:48PM +0100, Toke Høiland-Jørgensen wrote: >> Following up on the discussion at the BPF office hours (and subsequent >> discussion), this patch adds a description of API stability expectations >> for kfuncs. The goal here is to manage user expectations about what kind of >> stability can be expected for kfuncs exposed by the kernel. >> >> Since the traditional BPF helpers are basically considered frozen at this >> point, kfuncs will be the way all new functionality will be exposed to BPF >> going forward. This makes it important to document their stability >> guarantees, especially since the perception up until now has been that >> kfuncs should always be considered "unstable" in the sense of "may go away >> or change at any time". Which in turn makes some users reluctant to use >> them because they don't want to rely on functionality that may be removed >> in future kernel versions. >> >> This patch adds a section to the kfuncs documentation outlining how we as a >> community think about kfunc stability. The description is a bit vague and >> wishy-washy at times, but since there does not seem to be consensus to >> commit to any kind of hard stability guarantees at this point, I feat this >> is the best we can do. >> >> I put this topic on the agenda again for tomorrow's office hours, but >> wanted to send this out ahead of time, to give people a chance to read it >> and think about whether it makes sense or if there's a better approach. >> >> Previous discussion: >> https://lore.kernel.org/r/20230117212731.442859-1-toke@xxxxxxxxxx > > Again, thanks a lot for writing this down and getting a real / tangible > conversation started. You're welcome! Just a few quick notes on one or two points below, we can continue the discussion at the office hours: [..] > While I certainly understand the sentiment, I personally don't think I'd > describe this as the BPF community striking a balance in a way that > differs from EXPORT_SYMBOL_GPL. At the end of the day, as Alexei said in > [0], BPF APIs must never be a reason to block a change elsewhere in the > kernel. > > [0]: https://lore.kernel.org/bpf/20230119043247.tktxsztjcr3ckbby@MacBook-Pro-6.local/ "Block" is not the same as "delay", though. If we wanted, we *could* commit to a stronger guarantee, via the deprecation procedure. E.g., "kfuncs will never go away without first being marked as deprecated for at least X kernel releases". Yes, this will add some friction to development, but as long as this is stated up-front, subsystems could make an informed choice when choosing whether to expose something via a kfunc. I don't think this is necessarily a bad idea, either, it would enforce some discipline and make people think deeply about the API when exposing something. Just like people do today when adding new UAPI, but without the *indefinite* stability guarantees (i.e, mistakes can be fixed, eventually). [...] > To that point, I would argue that a kfunc's stability guarantees are > truly exactly the same as for any EXPORT_SYMBOL_GPL symbol. The only > differences between the two are that: > > a) In some ways it's quite a bit easier to support kfunc stability > thanks to CO-RE. > > b) It's more difficult to gauge how widely used a kfunc is in comparison > to an EXPORT_SYMBOL_GPL consumer due to the fact that (at this point, > far) fewer BPF programs are upstreamed in comparison to modules. > > Unfortunately, the fact that most BPF programs live outside the tree is > irrelevant to any obligation the kernel might have to accommodate > them. I don't really think it's true that it's exactly the same as for EXPORT_SYMBOL_GPL. For BPF programs, we *do* care about out-of-tree users, and we want to provide them reasonable stability guarantees, and functionality being actively used is a signal we do take into account when deciding whether to change a kfunc. Whereas with EXPORT_SYMBOL_GPL, people are told in no uncertain terms to go away if they complain about a symbol change that impacts an out-of-tree module. > It would be great if we upstreamed more BPF programs, in which case it > would be easier to change them when kfuncs change, and the signal of > how widely used a kfunc is would be stronger and more self-evident. > Perhaps it's worth calling that out as part of this doc as well well? I don't think this is really feasible. As just one example, this would imply that all the different BPF-enabled Kubernetes CNIs (Cilium, etc) should live in the kernel tree, and subsystem authors should go in and update their kfunc calls if they change on the kernel side. That's hardly realistic? :) -Toke