Re: Beyond Merge and Rebase: The Upstream Import Approach in Git

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Wed, 12 Jul 2023 13:34:05 +0200 (CEST)

Hi Aleksander,

On Tue, 11 Jul 2023, Aleksander Korzyński wrote:

> THE THIRD WAY - UPSTREAM IMPORT
>
> The proposed third way is a special operation that (in the described
> use case) has the advantages of both a merge and a rebase, without the
> disadvantages. The approach is illustrated below:
>
>   o---o---o---o---o  upstream/main
>        \           \
>         \           o'---o'---o'
>          \                     \
>           o---o---o-------------S  main
>
> First, the divergent commits from "main" are rebased on top of
> "upstream/main", but then they are combined back with "main" using a
> special merge commit, which has a custom strategy: it replaces the old
> content of "main" with the new rebased content. This last commit is
> the secret sauce of this solution: the commit has two parents, like an
> ordinary merge, but has the semantics of a rebase.
>
> The structure above has the advantages of both a merge and a rebase.
> On the one hand, just like with an ordinary merge, a user who runs
> "git pull" on their local copy of "main" is not going to see the error
> about divergent branches. On the other hand, just like with an
> ordinary rebase, there is visibility into the last imported commit
> from "upstream/main" and the differences between that commit and the
> tip of "main".

I know this strategy well, having used it initially to maintain Git for
Windows' patches on top of Git releases. I refer to it as `rebasing merge`
strategy.

The main benefit for me was that the patches were always kept in an
"upstreamable state", which incidentally also helped resolving the
merge conflicts that occurred by continually rebasing them onto upstream
releases.

However, I soon realized that the delineation between upstream and
downstream patches was unsatisfactory, in particular when new downstream
patches are added. In the context of the example above, try to find a `git
rebase` invocation that rebases the current set of downstream patches:

   o---o---o---o---o---o---o---o  upstream/main
        \           \
         \           o'---o'---o'
          \                     \
           o---o---o-------------S---o---o---o  main

A candidate to describe this in a commit range would be
`upstream/main..main ^S^`, but you cannot pass that to `git rebase -i`,
which expects a single upstream.

Side note: You could _simulate_ this by calling `git replace --graft
upstream/main upstream/main^ S^` before calling `git rebase -i
upstream/main`, but I found it really easy to forget to remove the replace
object afterwards, and I managed to confuse myself many times before
deciding to use replace objects only very rarely.

So I switched to a different scheme instead that I dub "merging rebase".
Instead of finishing the rebase with a merge, I start it with that merge.
In your example, it would look like this:

   o---o---o---o---o  upstream/main
        \           \
         o---o---o---M---o'---o'---o' main

Naturally, `M` needs to be a merge that _must_ be made with `-s ours` in
order to be "tree-same with upstream/main".

This strategy was implemented initially in
https://github.com/msysgit/msysgit/commit/95ae63b8c6c0b275f460897c15a44a7df5246dfb
and is in use to this day:
https://github.com/git-for-windows/build-extra/blob/main/shears.sh

This strategy is not without problems, though, which becomes quite clear
when you accept PRs that are based on commits prior to the most recent
merging rebase (or rebasing merge, both strategies suffer from the same
problem): the _next_ merging rebase will not necessarily find the most
appropriate base commit, in particular when rebasing with
`--rebase-merges`, causing unnecessary merge conflicts.

The underlying problem is, of course, the lack of mapping between
pre-rebase and post-rebase versions of the commits: Git has no idea
that two commits should be considered identical for the purposes of the
rebase, even if their SHA-1 differs. And in my hands, the patch ID has
been a poor tool to address this lack of mapping, almost always failing
for me. Not even hacked-up `git range-diff` was able to reconstruct the
mapping reliably enough.

And that problem, as far as I can tell, is still unsolved.

There have been efforts to this end, including
https://lore.kernel.org/git/pull.1356.v2.git.1664981957.gitgitgadget@xxxxxxxxx/,
but I do not think that any satisfying consensus was reached.

Ciao,
Johannes