Re: AUTOSEL process

Sasha Levin <sashal@xxxxxxxxxx> · Sat, 11 Mar 2023 13:26:57 -0500

On Sat, Mar 11, 2023 at 09:48:13AM -0800, Eric Biggers wrote:
On Sat, Mar 11, 2023 at 11:16:44AM -0500, Theodore Ts'o wrote:
On Sat, Mar 11, 2023 at 09:06:08AM -0500, Sasha Levin wrote:
>
> I suppose that if I had a way to know if a certain a commit is part of a
> series, I could either take all of it or none of it, but I don't think I
> have a way of doing that by looking at a commit in Linus' tree
> (suggestions welcome, I'm happy to implement them).

Well, this is why I think it is a good idea to have a link to the
patch series in lore.  I know Linus doesn't like it, claiming it
doesn't add any value, but I have to disagree.  It adds two bits of
value.

So, earlier I was going to go into more detail about some of my ideas, before
Sasha and Greg started stonewalling with "patches welcome" (i.e. "I'm refusing
to do my job") and various silly arguments about why nothing should be changed.
But I suppose the worst thing that can happen is that that just continues, so
here it goes:

"job"? do you think I'm paid to do this work? Why would I stonewall
improvements to the process?

I'm getting a bunch of suggestions and complaints that I'm not implementing
those suggestions fast enough on my spare time.

One of the first things I would do if I was maintaining the stable kernels is to
set up a way to automatically run searches on the mailing lists, and then take
advantage of that in the stable process in various ways.  Not having that is the
root cause of a lot of the issues with the current process, IMO.

"if I was maintaining the stable kernels" - why is this rellevant? give
us the tool you've proposed below and we'll be happy to use it. Heck,
don't give it to us, use it to review the patches we're sending out for
review and let us know if we've missed anything.

Now that lore exists, this might be trivial: it could be done just by hammering
lore.kernel.org with queries https://lore.kernel.org/linux-fsdevel/?q=query from
a Python script.

Of course, there's a chance that won't scale to multiple queries for each one of
thousands of stable commits, or at least won't be friendly to the kernel.org
admins.  In that case, what can be done is to download down all emails from all
lists, using lore's git mirrors or Atom feeds, and index them locally.  (Note:
if the complete history is inconveniently large, then just indexing the last
year or so would work nearly as well.)

Then once that is in place, that could be used in various ways.  For example,
given a git commit, it's possible to search by email subject to get to the
original patch, *even if the git commit does not have a Link tag*.  And it can
be automatically checked whether it's part of a patch series, and if so, whether
all the patches in the series are being backported or just some.

This could also be used to check for mentions of a commit on the mailing list
that potentially indicate a regression report, which is one of the issues we
discussed earlier.  I'm not sure what the optimal search criteria would be, but
one option would be something like "messages that contain the commit title or
commit ID and are dated to after the commit being committed".  There might need
to be some exclusions added to that.

This could also be used to automatically find the AUTOSEL email, if one exists,
and check whether it's been replied to or not.

The purpose of all these mailing list searches would be to generate a list of
potential issues with backporting each commit, which would then undergo brief
human review.  Once issues are reviewed, that state would be persisted, so that
if the script gets run again, it would only show *new* information based on new
mailing list emails that have not already been reviewed.  That's needed because
these issues need to be checked for when the patch is initially proposed for
stable as well as slightly later, before the actual release happens.

If the stable maintainers have no time for doing *any* human review themselves
(again, I do not know what their requirements are on how much time they can
spend per patch), then instead an email with the list of potential issues could
be generated and sent to stable@xxxxxxxxxxxxxxx for review by others.

Anyway, that's my idea.  I know the response will be either "that won't work" or
"patches welcome", or a mix of both, but that's it.

I've been playing with this in the past - I had a bot that looks at the
mailing lists for patches that are tagged for stable, and attempts to
apply/build then on the multiple trees to verify that it works and send
a reply back if something goes wrong, asking for a backport.

It gets a bit tricky as there's no way to go back from a commit to the
initial submission, you start hitting issues like:

- Patches get re-sent multiple times (think stuff like tip trees,
reviews from other maintainers, etc).
- Different versions of patches - for example, v1 was a single patch
and in v2 it became multiple patches.

I'm not arguing against your idea, I'm just saying that it's not
trivial. An incomplete work here simply won't scale to the thousands of
patches that flow in the trees, and won't be as useful. I don't think
that this is trivial as you suggest.

If you disagree, and really think it's trivial, take 5 minutes to write
something up? please?

--
Thanks,
Sasha