Re: [RFC] subtree: handle unmerged history trees

Tom Clarkson <tqclarkson@xxxxxxxxxx> · Mon, 11 May 2020 21:46:59 +1000

> On 7 May 2020, at 12:00 am, Claus Schneider <claus.schneider@xxxxxxxxxxx> wrote:

> - In bare mode it pushes changes to a separate branch containing the
> prefix changes which is fine. You get a problem when you run the next
> split. Either you re-split all the commits again - Or you add the
> -rejoin parameter with the result that the splitted prefix patches are
> part of your history twice or even more if you have further extracts.
> So this is either a performance issue or a usability issue.

A simpler way to link a split without including both histories would be to add a mainline commit with a git-subtree-split annotation, but without having the subtree commit as a parent. That would give you a reference to a commit not reachable from HEAD though, so plenty of opportunity to shoot yourself in the foot.

Persisting the cache between runs would be enough to avoid any potential performance penalty on subsequent splits, and is just a matter of changing the directory used. My unrelated patch implements that for other reasons, along with letting you specify specific commit mappings from script if that’s what you need.

> - Add traceability to each extracted commit in new history
>  - It enables humans to trace from the extracted commit to the
> original commit by basic reading, clicking in tools like gitk and
> scripting if desired
>  - Enable subtree itself to utilize the above mentioned traceability
> and simulate the add repository or rejoin merge commit. Subtree can
> then "behave" similarly independent of the method being used.

Have you considered how your annotations will behave if you import the same subtree repo into two different mainline repos? The subtree history would then have references to a bunch of commits that don’t exist. Adding similar annotations to merge commits on the mainline side seems like a good idea though, and would let you use find_existing_splits to avoid regenerating too many commits.

For the human readable link from the subtree repo to your original monorepo, perhaps a custom annotation would be a better fit - something like

git subtree split dir - - annotate-mainline-commit-as=“id-in-monorepo”

>  - Add option for rev-list so it can list based on
> prefix/subdirectory. I have not been able to find any error, issues or
> side effects adding the "-- $dir" to the rev-list command. All the
> manual tests, I have done, behave correctly in my total patched git
> revision. It gives a heck of performance for many-commit repositories.

Have you tested the rev-list dir option against preexisting history without your new annotations or created without split? If any of the new commits has a parent that is not in the rev-list, it will look up that commit individually and recursively. A git-subtree-mainline annotation will shortcut that, but without it the individual lookup is massively slower than working from even a very large rev-list.