Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

Sasha Levin <Alexander.Levin@xxxxxxxxxxxxx> · Mon, 16 Apr 2018 18:17:17 +0000

On Mon, Apr 16, 2018 at 01:44:23PM -0400, Steven Rostedt wrote:
>On Mon, 16 Apr 2018 17:16:10 +0000
>Sasha Levin <Alexander.Levin@xxxxxxxxxxxxx> wrote:
>
>
>> So if a user is operating a nuclear power plant, and has 2 leds: green
>> one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and
>> once in a blue moon a race condition is causing the red one to go on and
>> cause panic in the little province he lives in, we should tell that user
>> to fuck off?
>>
>> LEDs may not be critical for you, but they can be critical for someone
>> else. Think of all the different users we have and the wildly different
>> ways they use the kernel.
>
>We can point them to the fix and have them backport it. Or they should
>ask their distribution to backport it.

It may work in your subsystem, but it really doesn't work this way with
the kernel.

Let me share a concrete example with you: there's a vfs bug that's a
pain to reproduce going around. It was originally reported on
CoreOS/AWS:

	https://github.com/coreos/bugs/issues/2356

But our customers reported to us that they're hitting this issue too.

We couldn't reproduce it, and the call trace indicated it may be a
memory corrution. We could however confirm with the customers that the
latest mainline fixes the issue.

Given that we couldn't reproduce it, and neither of us is a fs/ expert,
we sent a mail to LKML, just like you suggested doing:

	https://lkml.org/lkml/2018/3/2/1038

But unlike what you said, no one pointed us to the fix, even though the
issue was fixed on mainline. Heck, no one engaged in any meaningful
conversation about the bug.

I really think that we have a different views as to how well the whole
"let me shoot a mail to LKML" process works, which leads to different
views on -stable.

>Hopefully they tested the kernel they are using for something like
>that, and only want critical fixes. What happens if they take the next
>stable assuming that it has critical fixes only, and this fix causes a
>regression that creates the "ALL OK!" when it wasn't.
>
>Basically, I rather have stable be more bug compatible with the version
>it is based on with only critical fixes (things that will cause an
>oops) than to try to be bug compatible with mainline, as then we get
>into a state where things are a frankenstein of the stable base version
>and mainline. I could say, "Yeah this feature works better on this
>4.x version of the kernel" and not worry about "4.x.y" versions having
>it better.

This is how things used to work, right? Look at redhat kernels for
example, they'd stick with a kernel for tens of years, doing the tiniest
fixes, only when customers complained, and encouraging users to upgrade
only when the kernel would go EoL, and when customers couldn't do that
because they were too locked on that kernel version.

redhat still supports 2.6.9.

I thought we agreed that this is bad? We wanted users to be closer to
mainline, and we can't do it without bringing -stable closer to mainline
as well.