On Wed, Jan 04, 2023 at 01:55:32PM -0800, Andrii Nakryiko wrote: [...] > > > Yes, we won't change existing helpers, but we can add new ones if we > > > need to extend them. That's how APIs work. Yes, they need careful > > > considerations when designing and implementing new APIs. Yes, mistakes > > > do happen, that's just fact of life and par for the course of software > > > development. Yes, we have to live with those mistakes. Nothing changed > > > about that. > > > > > > But somehow libraries and kernel still produce stable APIs and > > > maintain them because they clearly provide benefits to end users. > > > > Did you 'live with mistakes done in libbpf 0.x' ? No. > > for a long time yes. And it's not apples to apples comparison, with > library it is possible to deprecate APIs, which is what we did. With > lots of work and gradual transition, but did it. User space <-> kernel is not an apples to apples comparison with kernel <-> BPF programs either. Also, you're using the word "possible" here like it's a foregone conclusion. It is "possible" to deprecate BPF APIs as well, if we start using kfuncs going forward instead of adding to the UAPI boundary. > If we couldn't pull this through, yeah, I would live with whatever > APIs are there. And added new ones as a better replacement. As is > always done for APIs, nothing new here. The point is that you had a choice. > Within 0.x and 1.x APIs are stable and we live with them. This API > stability fear doesn't paralyze libbpf development, we still add new > stable APIs, if they are considered useful and thought through enough. Nobody is claiming that we can't have stable APIs. We're arguing in favor of being able to _choose_ which APIs to deprecate. Using your logic, you wouldn't have been able to deprecate _anything_ for fear of some user, somewhere being affected by it. I understand the sentiment, and I agree that it's very important to have conservative and predictable approaches to deprecation. What I don't think is important is to provide _indefinite_ guarantees for _all_ APIs between two different kernel contexts. And to reiterate, as I've said a few times now but nobody seems to be responding to (unless I missed something), this is for kernel <-> kernel programs. We're not even talking about APIs that are available to user space. Let's at least be clear about the boundaries for which we're debating the merits of stability, because while some user space tooling would certainly affected by choosing to freeze BPF helpers, kfuncs and BPF helpers are ever invoked by _kernel_ programs. > > You've introduced libbpf 1.0 with incompatible api and some users suffereed. > > By "suffered" you mean a few systemd folks being grumpy about this? > And having to do 100 lines of code changes ([0]) to support two > incompatible major versions of libbpf *simultaneously*? > > On the other hand we got a library with saner error propagation > behavior and various API normalizations and additions. Not too bad of > a trade off. This sounds like an argument in favor of why it is acceptable to deprecate some things? Why are some users allowed to feel "pain" (a term you've used in other threads), but other users who are affected by your choices are just "grumpy"? Also, what about the myriad hypothetical users you've never heard of (the ones who we're really protecting with UAPI) who had to deal with breaking API stability changes? > Sure, deprecation is not easy or free, there was a lot of prep work, > and some users had to adjust their code to use new APIs. But this is > quite a tangent. I don't see how this is tangential to the discussion -- it seems very relevant. From my perspective, the core of the discussion has been whether it's acceptable to shift _any_ of the burden of API stability to users. My point, and I believe Alexei's point as well, is that the answer is "it depends and it's a tradeoff", as you've essentially said here. What I'm failing to understand is why your argument that there are tradeoffs applies here, but not for kernel <-> BPF kernel programs? I'm genuinely trying to understand what the distinction is, because from where I'm sitting it feels like we're being selective about when the unknown _threat_ of API instability automatically completely overrides our ability to choose our own deprecation and stability story (a stability story which is informed by our perception of an API's importance, usage, etc). Note that my point here applies to something you've raised on other threads as well, such as on [0] where you (reasonably) reiterated this point: [0]: https://lore.kernel.org/all/CAEf4BzY0aJNGT321Y7Fx01sjHAMT_ynu2-kN_8gB_UELvd7+vw@xxxxxxxxxxxxxx/ > But again. Let me repeat my point *again*. BPF helpers and kfuncs are > not mutually exclusive, both can and should exist and evolve. That's > one of the main points which is somehow eluding this conversation. This is one of the big disconnects for me. If you argue that both BPF helpers and kfuncs can and should continue to coexist indefinitely, it feels like you're arguing for two incompatible points (and please correct me anywhere that I'm unintentionally misrepresenting your perspective here): - On the one hand you're arguing that in some cases, _no_ API instability is acceptable. That in general, the main kernel <-> kernel BPF program API boundary is equivalent to UAPI, and that it's _never_ acceptable for us to ever, _ever_ deprecate certain APIs because _some_ users may be using them, and the possibilty of APIs ever changing or being deprecated will impose an unacceptable pain to users which will make it too difficult to build tooling and, and end up discouraging adoption onto BPF. It seems that you've been making making this argument in favor of what you consider to be "core" BPF helpers such as bpf_dynptr_is_null(), etc. - At the same time, on the other hand, you're arguing that _some_ of the API boundary between kernel <-> BPF program can be unstable. That it's acceptable for _some_ users and _some_ tooling to feel the pain of certain APIs changing. To perhaps extrapolate your point a bit further, you're arguing that niche / non-core kfuncs can be unstable, and that we don't have to worry about the unknown, hypothetical user who would feel pain from having to deal with them being deprecated, because they're not "core". Assuming that's all true, my question is: Why not just give ourselves the _option_ of being able to deem those core helpers as being indefinitely stable for the foreseeable future, and keep the unstable kfuncs to have the same stability guarantees as what they have today? In terms of _stability_ specifically (so ignoring other concerns you've raised, such as that we need BTF and BPF trampoline support for kfuncs -- not because they're irrelevant, but just to keep the discussion focused on stability), what do we gain by keeping the "core" / "stable" functions as BPF helpers, instead of just making them "super stable" kfuncs? At least then we have the option in the far-far-far future to deprecate them if they eventually, way later, become 100% obsolete. Plus you get the other benefits that Alexei mentioned such as potentially being able to backport them to older kernels by including them in modules, etc. Note that I'm not saying with 100% conviction that we don't have _any_ work to do before freezing helpers (though IMO we should just rip the bandaid and do it now), but I am arguing with strong conviction that once any of that precursor work is taken care of, there is no reason to use BPF helpers in place of kfuncs. At least, that's how I see it at this point. > [0] https://github.com/systemd/systemd/pull/24511/ > > > > > > We'll get the same amount of flame when we try to change kfunc that's > > > widely adopted. > > > > Of course. That's why we need to define a stability and deperecation > > plan for them. > > Lots of things that need to be defined and figured out, but we are > already quick to freeze BPF helpers. I agree with you that it would be prudent for us to iron some of this out more concretely. In this discussion it seems like one of the key points of contention has been around stability, and that the lack of a concrete policy for kfuncs has largely (but not completely) been the cause for concern. Perhaps it would help clarify things if someone submitted a patch set that included a more formal kfunc stability proposal?