Hi Johannes, My responses are inline: > I know this strategy well, having used it initially to maintain Git for > Windows' patches on top of Git releases. It's good to know others are using similar ideas :-) > However, I soon realized that the delineation between upstream and > downstream patches was unsatisfactory, in particular when new downstream > patches are added. In the context of the example above, try to find a `git > rebase` invocation that rebases the current set of downstream patches: > > o---o---o---o---o---o---o---o upstream/main > \ \ > \ o'---o'---o' > \ \ > o---o---o-------------S---o---o---o main We have solved that problem with custom scripting. The git-upstream[1] tool properly rebases the commits in that case. This is one of the reasons why I would like to see the git-upstream functionality reimplemented in git itself. With git today, you can't achieve that with a single `git rebase` command, but you can with a series of commands. Introducing a new command or switch to git would allow us to perform that operation with a single command. Let's recap what the automation needs to do. Assume the following situation: o---o---o---o---x---o---o---t tag=v1.2.3 branch=upstream/main \ \ \ a'---b'---c' \ \ a---b---c-------------S---d---e branch=main Let's assume the user has the "main" branch checked out and they want to import the latest tag from the "upstream/main" branch. The commands they run are: git checkout main git upstream import v1.2.3 The automation should now perform the following: * create a new branch "import/v1.2.3" starting from tag "v1.2.3" * rebase a', b', c' onto "import/v1.2.3" * rebase d, e onto "import/v1.2.3" * perform the cauterizing merge of "import/v1.2.3" to "main" Here are important observations: * the first rebase operates on commits present on the main branch, starting from the first commit after x, ending with the last commit before S * the second rebase operates on commits present on the main branch, starting from the first commit after S, ending with the tip of main So the problem boils down to identifying commits x and S. Once we identify these commits, we can perform the rebases. To identify x we need to find the most recent common ancestor of "main" and "v1.2.3". To identify S we need to iterate over branch "main" starting from x and forward in time until we find the first merge. That's the logic that needs to be implemented. If that logic was available under a single command or switch in git, we'd be able to perform the upstream import operation without a helper script such as git-upstream. > This strategy is not without problems, though, which becomes quite clear > when you accept PRs that are based on commits prior to the most recent > merging rebase (or rebasing merge, both strategies suffer from the same > problem): the _next_ merging rebase will not necessarily find the most > appropriate base commit, in particular when rebasing with > `--rebase-merges`, causing unnecessary merge conflicts. This can also be solved with custom logic. Let's consider the scenario in detail: o---o---o---o---x---o---o---t tag=v1.2.3 branch=upstream/main \ \ \ a'---b' \ \ a---b------------S---c---M---f branch=main \ / d----------------e branch=topic As before, the user runs the following commands: git checkout main git upstream import v1.2.3 In this case, the automation should rebase d and e between c and f: o---o---o---o---x---o---o---t tag=v1.2.3 branch=upstream/main \ \ \ \ a'---b' a"--b"--c"--d"--e"--f" \ \ \ a---b------------S---c---M---f--------------S' branch=main \ / d----------------e branch=topic This logic can be implemented as follows. When the automation reaches the merge commit M, it finds the second parent e and then searches for the most recent common ancestor of e and main, so that it finds b. The rebase then operates on commits starting from the first commit after b and ending with the second parent of M. The logic above could also be incorporated into git. > The underlying problem is, of course, the lack of mapping between > pre-rebase and post-rebase versions of the commits: Git has no idea > that two commits should be considered identical for the purposes of the > rebase, even if their SHA-1 differs. And in my hands, the patch ID has > been a poor tool to address this lack of mapping, almost always failing > for me. Not even hacked-up `git range-diff` was able to reconstruct the > mapping reliably enough. > > And that problem, as far as I can tell, is still unsolved. As shown above, we don't actually need to be able to map pre-rebase and post-rebase versions of the commits in order to correctly perform the "git upstream import" operation. The "git-upstream" helper script is a working implementation of the strategy without doing the mapping. That being said, being able to map pre-rebase and post-rebase versions of the commits is useful for something else: dropping patches that have been incorporated upstream. The "git-upstream" script utilizes two strategies for that purpose. One of them is to use patch-id. The other one is to use an arbitrary identifier that you attach to the commit both in the "main" and "upstream/main" branches. In our case, we have used the Gerrit's Change-Id as the identifier, but it could be something else. The Gerrit's Change-Id is just a random string added to the bottom of a commit message by a git commit hook. > So I switched to a different scheme instead that I dub "merging rebase". > Instead of finishing the rebase with a merge, I start it with that merge. > In your example, it would look like this: > > o---o---o---o---o upstream/main > \ \ > o---o---o---M---o'---o'---o' main I like Junio's word "cauterize" to describe the special merge :-) So I'm going to call this strategy "cauterize & rebase" and the strategy I described in the initial email "rebase & cauterize". We have also considered "cauterize & rebase" instead of "rebase & cauterize" and the reason we opted for the latter was peer review in Gerrit. When we rebase first, we can store the rebased commits on a temporary import branch and push the import branch to a shared repository. The import branch then contains everything except for the last cauterizing merge. We then need to push only the cauterizing merge into the Gerrit review system. The reviewer then only has to approve the cauterizing merge to approve the entire "upstream import" structure. We didn't need to make any changes to the Gerrit review system to utilize it in that way. These considerations may not apply to other review systems. > This strategy was implemented initially in > https://github.com/msysgit/msysgit/commit/95ae63b8c6c0b275f460897c15a44a7df5246dfb > and is in use to this day: > https://github.com/git-for-windows/build-extra/blob/main/shears.sh > (...) > https://lore.kernel.org/git/pull.1356.v2.git.1664981957.gitgitgadget@xxxxxxxxx/ Thanks for the links, they are useful :-) With the content of this email in mind, what are your thoughts? Would you like to see the strategy becoming a first-class feature in git? Best regards, Aleksander Korzynski [1] https://opendev.org/x/git-upstream