Re: RFC: Github PR bot questions

Konstantin Ryabitsev <konstantin@xxxxxxxxxxxxxxxxxxx> · Thu, 17 Jun 2021 10:47:41 -0400

On Wed, Jun 16, 2021 at 03:11:33PM -0600, Rob Herring wrote:
> > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > pull requests on a project and convert them into fully well-formed patch
> > series. This would be a one-way operation, effectively turning Github into a
> > fancy "git-send-email" replacement. That said, it would have the following
> > benefits for both submitters and maintainers:
> 
> What makes this specific to Github PRs? A Github PR is really just a
> git branch plus a target at least to the extent we would use it here.
> The more of this that works on just a git branch, the more widely
> useful it would be.

It's not specific to GH at all. The same bot will be able to perform similar
actions to emails created by git-request-pull, e.g.:

- submitter runs git request-pull instead of git-format-patch
- submitter sends the output to a dedicated mailing list like
  pulls@xxxxxxxxxxxxxxx
- the bot auto-converts these requests into patch series and sends them to
  proper destinations

This is more cumbersome to implement, though, which is why I want to get it
done with GH first, as this gets us some immediate perks:

1. we get a fast, stable remote to pull from instead of potentially slow,
   broken remote that's intermittently working
2. we can offload all sanity checking to github instead of reimplementing them
   with our own CI
3. we end up doing a lot less state tracking for v1..v2..v3 with github

Once the GH implementation is working, I can adapt it to also support other
forges and pull requests sent to mailing lists.

> > - submitters would no longer need to navigate their way around
> >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> >   have a patch-friendly outgoing mail gateway to properly contribute patches
> 
> Presumably, the bot would rely on get_maintainer.pl or it would get
> who to send to based on GH repo and reviewers? Without work on
> get_maintainer.pl, I don't think it will work well beyond simple
> cases.

The bot will actually rely on git-send-email, which can be configured to use
"tocmd" and "cccmd" to get the necessary info from get_maintainer.pl. E.g. in
my tests I have the following:

    tocmd = "$(git rev-parse --show-toplevel)/scripts/get_maintainer.pl --norolestats --nol"
    cccmd = "$(git rev-parse --show-toplevel)/scripts/get_maintainer.pl --norolestats --nom"

This does the right thing *most* of the time, and if it's not doing the right
thing, then it's the fault of get_maintainer.pl. :)

> > - subsystem maintainers can configure whatever CI pre-checks they want before
> >   the series is sent to them for review (and we can work on a library of
> >   Github actions, so nobody needs to reimplement checkpatch.pl multiple times)
> 
> What about all the patches that don't come from the GH PR? Those need
> CI pre-checks too. We're going to implement CI twice?

Most likely, yes, though we can certainly weigh how much we want to do on the
GH side. One thing I've thought about is letting bot inject a Tested-by: into
the patches it creates in order to reflect what's been already done, e.g.:

    Tested-by: GH Preflight Bot <ghbot@xxxxxxxxxx>

There is indeed a lot of duplicated CI testing happening for Linux patches,
but it's a separate problem that I believe is being looked at by the Kernel CI
folks.

> The biggest issue I have on CI checks is applying patches. My algorithm is
> apply to my current base (last rc1 typically) or give up. I'm sure it could
> be a lot smarter trying several branches or looking at base-commit (not
> consistently used) or the git diff treeish hashes. What I'd really like is
> some bot or script that's applying series and publishing git branches with a
> messageid to git branch tool. 0-day is doing this now. Basically, the
> opposite direction as others have mentioned.

b4 will try to do this for you with -g, but it will only check against the
last 10 tags, as otherwise this takes a very long time, especially on series
that modify a lot of files. It can probably be a lot more intelligent about it
and work more like git bisect. I'll look into improving this feature.

> I think it needs to be per maintainer in terms of what checks run, but
> if submission is per maintainer project then the problem will be how
> does the submitter know where to send something? get_maintainer.pl
> tells them? It doesn't do a great job of that IMO. There's not a clear
> distinction of who applies my patch and others Cc'ed (file
> maintainers).

Yes, I'm leaning further towards having the submission point be a single
project per forge, and then just running some bare-minimum checks, similar to
what would be expected of the submitter using git-format-patch.

> I've kind of reached the conclusion that relying on submitters to get
> it right is never going to work (is Cc the DT list for DT patches so
> PW picks them up so hard!?). I think the model needs to be send
> patches to 'the kernel' and then maintainers have tools to extract all
> the patches they are interested in (the planned lore
> local-email-interface).

Yes! I'm hoping that we'll soon get to the point where "just send your patch
to linux-kernel@xxxxxxxxxxxxxxx" becomes a reasonable thing to say again. E.g.
See this tread here:
https://public-inbox.org/meta/20210426164454.5zd5kgugfhfwfkpo@nitro.local/t/#u

However, I'm approaching this from multiple ends, so fixing up
get_maintainer.pl to return something reasonable also needs to happen imo.

-K