Re: AUTOSEL process

Sasha Levin <sashal@xxxxxxxxxx> · Mon, 27 Feb 2023 17:35:30 -0500

On Mon, Feb 27, 2023 at 09:38:46PM +0000, Eric Biggers wrote:
On Mon, Feb 27, 2023 at 03:39:14PM -0500, Sasha Levin wrote:
> > So to summarize, that buggy commit was backported even though:
> >
> >   * There were no indications that it was a bug fix (and thus potentially
> >     suitable for stable) in the first place.
> >   * On the AUTOSEL thread, someone told you the commit is broken.
> >   * There was already a thread that reported a regression caused by the commit.
> >     Easily findable via lore search.
> >   * There was also already a pending patch that Fixes the commit.  Again easily
> >     findable via lore search.
> >
> > So it seems a *lot* of things went wrong, no?  Why?  If so many things can go
> > wrong, it's not just a "mistake" but rather the process is the problem...
>
> BTW, another cause of this is that the commit (66f99628eb24) was AUTOSEL'd after
> only being in mainline for 4 days, and *released* in all LTS kernels after only
> being in mainline for 12 days.  Surely that's a timeline befitting a critical
> security vulnerability, not some random neural-network-selected commit that
> wasn't even fixing anything?

I would love to have a mechanism that tells me with 100% confidence if a
given commit fixes a bug or not, could you provide me with one?

Just because you can't be 100% certain whether a commit is a fix doesn't mean
you should be rushing to backport random commits that have no indications they
are fixing anything.

The difference in opinion here is that I don't think it's rushing: the
stable kernel rules say a commit must be in a released kernel, while the
AUTOSEL timelines make it so a commit must have been in two released
kernels.

Should we extend it? Maybe, I just don't think we have enough data to
make a decision either way.

w.r.t timelines, this is something that was discussed on the mailing
list a few years ago where we decided that giving AUTOSEL commits 7 days
of soaking time is sufficient, if anything changed we can have this
discussion again.

Nothing has changed, but that doesn't mean that your process is actually
working.  7 days might be appropriate for something that looks like a security
fix, but not for a random commit with no indications it is fixing anything.

How do we know if this is working or not though? How do you quantify the
amount of useful commits?

How do you know if a certain fix has security implications? Or even if
it actually fixes anything? For every "security" commit tagged for
stable I could probably list a "security" commit with no tags whatsoever.

BTW, based on that example it's not even 7 days between AUTOSEL and patch
applied, but actually 7 days from AUTOSEL to *release*.  So e.g. if someone
takes just a 1 week vacation, in that time a commit they would have NAK'ed can
be AUTOSEL'ed and pushed out across all LTS kernels...

Right, and same as above: what's "enough"?

Note, however, that it's not enough to keep pointing at a tiny set and
using it to suggest that the entire process is broken. How many AUTOSEL
commits introduced a regression? How many -stable tagged ones did? How
many bugs did AUTOSEL commits fix?

So basically you don't accept feedback from individual people, as individual
people don't have enough data?

I'd love to improve the process, but for that we need to figure out
criteria for what we consider good or bad, collect data, and make
decisions based on that data.

What I'm getting from this thread is a few anecdotal examples and
statements that the process isn't working at all.

I took Jon's stablefixes script which he used for his previous articles
around stable kernel regressions (here:
https://lwn.net/Articles/812231/) and tried running it on the 5.15
stable tree (just a random pick). I've proceeded with ignoring the
non-user-visible regressions as Jon defined in his article (basically
issues that were introduced and fixed in the same releases) and ended up
with 604 commits that caused a user visible regression.

Out of those 604 commits:

 - 170 had an explicit stable tag.
 - 434 did not have a stable tag.

Looking at the commits in the 5.15 tree:

With stable tag:

	$ git log --oneline -i --grep "cc.*stable" v5.15..stable/linux-5.15.y | wc -l
	3676

Without stable tag (-96 commits which are version bumps):

	$ git log --oneline --invert-grep -i --grep "cc.*stable" v5.15..stable/linux-5.15.y | wc -l
	10649

Regression rate for commits with stable tag: 170 / 3676 = 4.62%
Regression rate for commits without a stable tag: 434 / 10553 = 4.11%

Is the analysis flawed somehow? Probably, and I'd happy take feedback on
how/what I can do better, but this type of analysis is what I look for
to know if the process is working well or not.

--
Thanks,
Sasha