Re: RFC: Github PR bot questions

Rob Herring <robh@xxxxxxxxxx> · Thu, 17 Jun 2021 11:15:23 -0600

On Thu, Jun 17, 2021 at 8:47 AM Konstantin Ryabitsev
<konstantin@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Jun 16, 2021 at 03:11:33PM -0600, Rob Herring wrote:
> > > I've been doing some work on the "github-pr-to-ml" bot that can monitor GitHub
> > > pull requests on a project and convert them into fully well-formed patch
> > > series. This would be a one-way operation, effectively turning Github into a
> > > fancy "git-send-email" replacement. That said, it would have the following
> > > benefits for both submitters and maintainers:
> >
> > What makes this specific to Github PRs? A Github PR is really just a
> > git branch plus a target at least to the extent we would use it here.
> > The more of this that works on just a git branch, the more widely
> > useful it would be.
>
> It's not specific to GH at all. The same bot will be able to perform similar
> actions to emails created by git-request-pull, e.g.:
>
> - submitter runs git request-pull instead of git-format-patch
> - submitter sends the output to a dedicated mailing list like
>   pulls@xxxxxxxxxxxxxxx
> - the bot auto-converts these requests into patch series and sends them to
>   proper destinations

This seems like 2 separate problems. The first is automating the steps
from a branch to patch series. The second is just avoiding email
configuration issues.
The first would be useful to everyone, so it would be great if we
could keep these as separate tools. This is why b4 is so great and has
been pretty widely adopted. Maintainers can plug-in whatever parts of
it they want for their existing workflow.

> This is more cumbersome to implement, though, which is why I want to get it
> done with GH first, as this gets us some immediate perks:
>
> 1. we get a fast, stable remote to pull from instead of potentially slow,
>    broken remote that's intermittently working
> 2. we can offload all sanity checking to github instead of reimplementing them
>    with our own CI
> 3. we end up doing a lot less state tracking for v1..v2..v3 with github
>
> Once the GH implementation is working, I can adapt it to also support other
> forges and pull requests sent to mailing lists.
>
> > > - submitters would no longer need to navigate their way around
> > >   git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to
> > >   have a patch-friendly outgoing mail gateway to properly contribute patches
> >
> > Presumably, the bot would rely on get_maintainer.pl or it would get
> > who to send to based on GH repo and reviewers? Without work on
> > get_maintainer.pl, I don't think it will work well beyond simple
> > cases.
>
> The bot will actually rely on git-send-email, which can be configured to use
> "tocmd" and "cccmd" to get the necessary info from get_maintainer.pl. E.g. in
> my tests I have the following:
>
>     tocmd = "$(git rev-parse --show-toplevel)/scripts/get_maintainer.pl --norolestats --nol"
>     cccmd = "$(git rev-parse --show-toplevel)/scripts/get_maintainer.pl --norolestats --nom"
>
> This does the right thing *most* of the time, and if it's not doing the right
> thing, then it's the fault of get_maintainer.pl. :)

True, but I suspect the complaints will be directed at the bot rather
than generating fixes to get_maintainer.pl.

> > > - subsystem maintainers can configure whatever CI pre-checks they want before
> > >   the series is sent to them for review (and we can work on a library of
> > >   Github actions, so nobody needs to reimplement checkpatch.pl multiple times)
> >
> > What about all the patches that don't come from the GH PR? Those need
> > CI pre-checks too. We're going to implement CI twice?
>
> Most likely, yes, though we can certainly weigh how much we want to do on the
> GH side. One thing I've thought about is letting bot inject a Tested-by: into
> the patches it creates in order to reflect what's been already done, e.g.:
>
>     Tested-by: GH Preflight Bot <ghbot@xxxxxxxxxx>
>
> There is indeed a lot of duplicated CI testing happening for Linux patches,
> but it's a separate problem that I believe is being looked at by the Kernel CI
> folks.
>
> > The biggest issue I have on CI checks is applying patches. My algorithm is
> > apply to my current base (last rc1 typically) or give up. I'm sure it could
> > be a lot smarter trying several branches or looking at base-commit (not
> > consistently used) or the git diff treeish hashes. What I'd really like is
> > some bot or script that's applying series and publishing git branches with a
> > messageid to git branch tool. 0-day is doing this now. Basically, the
> > opposite direction as others have mentioned.
>
> b4 will try to do this for you with -g, but it will only check against the
> last 10 tags, as otherwise this takes a very long time, especially on series
> that modify a lot of files. It can probably be a lot more intelligent about it
> and work more like git bisect. I'll look into improving this feature.

Humm, I can't seem to get -g to tell me anything but 'current tree'.

How I think this could work is extracting all the files and their base
treeish hash from a patch and iterate thru branches (user and/or
project specific) and find branch which has matching set of treeish
hashes. Seems like this wouldn't be too hard for someone that knows
the git internals.

Rob