Hello, Git users often have to make a choice: to merge or rebase. I'm going to describe a third way that has the characteristics of both and is very well suited for tracking an open-source project or any other upstream branch. I'm looking for feedback on the approach. MERGE OR REBASE? Let's assume that you have forked an upstream open-source repository and keep the fork in your own repo. The default branch of the upstream repository is called "main" and is called the same in your own fork. You have made a few changes to the source code and committed them to the "main" branch of your fork. In the meantime, new changes have been committed to the upstream "main" branch of the project. How do you import the upstream changes to your fork? Let's assume that your local fork also contains a branch called "upstream/main", which reflects the state of the upstream's "main" branch. So the "main" branch contains your own changes and the "upstream/main" branch contains the community's changes: time --> o---o---o---o---o upstream/main \ o---o---o main So a different way to ask the question is: how do you bring upstream/main's changes into main? One solution is to merge "upstream/main" into "main": o---o---o---o---o upstream/main \ \ o---o---o---M main The merge above would certainly work, but it becomes problematic as time passes and you get a lot of these merges in your "main" branch. You then no longer have visibility into the differences between "upstream/main" and "main", because your commits get lost deep in the history of the branch, as illustrated below: o---o---o---o---o---o---o---o---o---o---o upstream/main \ \ \ \ \ o---o---o---M---o---M---o---M---o---M main So the alternative solution is to rebase your "main" branch on top of "upstream/main": o---o---o---o---o upstream/main \ o'---o'---o' main You now have the advantage of having greater visibility into the differences between "upstream/main" and "main". However, a rebase comes with a different problem: if any user of your fork had the "main" branch checked out in their local repository and they run "git pull", they are going to get an error stating that the local and upstream branches have diverged. They will have to take special steps to recover from the rebase of the "main" branch. So how to solve that problem? THE THIRD WAY - UPSTREAM IMPORT The proposed third way is a special operation that (in the described use case) has the advantages of both a merge and a rebase, without the disadvantages. The approach is illustrated below: o---o---o---o---o upstream/main \ \ \ o'---o'---o' \ \ o---o---o-------------S main First, the divergent commits from "main" are rebased on top of "upstream/main", but then they are combined back with "main" using a special merge commit, which has a custom strategy: it replaces the old content of "main" with the new rebased content. This last commit is the secret sauce of this solution: the commit has two parents, like an ordinary merge, but has the semantics of a rebase. The structure above has the advantages of both a merge and a rebase. On the one hand, just like with an ordinary merge, a user who runs "git pull" on their local copy of "main" is not going to see the error about divergent branches. On the other hand, just like with an ordinary rebase, there is visibility into the last imported commit from "upstream/main" and the differences between that commit and the tip of "main". DROPPING PATCHES What is supposed to happen if one of the commits from "main" is ported to "upstream/main", as illustrated below? o---o---o---A'---o upstream/main \ \ \ A---B---C main In that case, the upstream importing operation should drop that patch, as illustrated below: o---o---o---A'---o upstream/main \ \ \ B'---C' \ \ A---B---C---------S main But how would the upstream importing operation know which patches to drop? There are one of two ways. Firstly, it can look at the git's patch-id, which is the SHA of the file changes with line numbers ignored. This is the same strategy that rebase uses to drop duplicate commits. Secondly, it can use an arbitrary change-id associated with a commit (for example, for projects that use Gerrit, it can be the Gerrit's Change-Id, which is saved in the commit message). This is useful when a given patch lands upstream in a slightly changed form, but is meant to replace the version in "main". IMPLEMENTATION The solution above has already been implemented in an open-source Python script called git-upstream[1], published 10 years ago. It was originally implemented for the OpenStack project, but the solution is generic and applicable to any open-source project. It is going to be easier for users to benefit from the ideas behind git-upstream if the functionality is integrated directly into git. Would you like to see the above functionality integrated directly into git? Best regards, Aleksander Korzynski www.linkedin.com/in/akorzy www.devopsera.com/blog P.S. For completeness, I'm providing links to alternative solutions for tracking patches: * git-upstream[1] uses the strategy described above * quilt[2] uses patch files saved in a source code repository * StGit[3] is inspired by quilt and uses git commits to store patches * MQ[4] is also inspired by quilt and implements a patch queue in Mercurial [1] https://opendev.org/x/git-upstream [2] https://savannah.nongnu.org/projects/quilt [3] https://stacked-git.github.io [4] https://wiki.mercurial-scm.org/MqExtension