On Mon, Dec 03, 2018 at 11:22:46PM +0159, Thomas Backlund wrote:
Den 2018-12-03 kl. 11:22, skrev Sasha Levin:
This is a case where theory collides with the real world. Yes, our QA is
lacking, but we don't have the option of not doing the current process.
If we stop backporting until a future data where our QA problem is
solved we'll end up with what we had before: users stuck on ancient
kernels without a way to upgrade.
Sorry, but you seem to be living in a different "real world"...
People stay on "ancient kernels" that "just works" instead of updating
to a newer one that "hopefully/maybe/... works"
If users are stuck at older kernels and refuse to update then there's
not much I can do about it. They are knowingly staying on kernels with
known issues and will end up paying a much bigger price later to update.
With the current model we're aware that bugs sneak through, but we try
to deal with it by both improving our QA, and encouraging users to do
their own extensive QA. If we encourage users to update frequently we
can keep improving our process and the quality of kernels will keep
getting better.
And here you want to turn/force users into QA ... good luck with that.
Yes, users are expected to test their workloads with new kernels - I'm
not sure why this is a surprise to anyone. Isn't it true for every other
piece of software?
I invite you to read Jon's great summary on LWN of a related session
that happened during the maintainer's summit:
https://lwn.net/Articles/769253/ . The conclusion reached was very
similar.
In reality they wont "update frequently", instead they will stop
updating when they have something that works... and start ignoring
updates as they expect something "to break as usual" as they actually
need to get some real work done too...
Again, this model was proven to be bad in the past, and if users keep
following it then they're knowingly shooting themselves in the foot.
We simply can't go back to the "enterprise distro" days.
Maybe so, but we should atleast get back to having "stable" or
"longterm" actually mean something again...
Or what does it say when distros starts thinking about ignoring
(and some already do) stable/longterm trees because there is
_way_ too much questionable changes coming through, even overriding
maintainers to the point where they basically state "we dont care
about monitoring stable trees anymore, as they add whatever they want
anyway"...
I'm assuming you mean "enterprise distros" here, as most of the
community distros I'm aware of are tracking stable trees.
Enterprise distros are a mix of everything: on one hand they would
refuse most stable patches because they don't have any demand from
customers to fix those bugs, but on the other hand they will update
drivers and subsystems as a whole to create these frankenstein kernels
that are very difficult to support.
When your kernel is driven by paying customer demands it's difficult to
argue for the technical merits of your process.
And pretending that every fix is important enough to backport,
and saying if you dont take everything you have an "unsecure" kernel
wont help, as reality has shown from time to time that backports
can/will open up a new issue instead for no good reason
Wich for distros starts to mean, switch back to selectively taking fixes
for _known_ security issues are considered way better choice
That was my exact thinking 2 years ago (see my stable-security project:
https://lwn.net/Articles/683335/). I even had a back-and-forth with Greg
on LKML when I was trying to argue your point: "Lets only take security
fixes because no one cares about the other crap".
If you're interested, I'd be happy to explain further why this was a
complete flop.
--
Thanks,
Sasha