On Wed, 18 Dec 2019 at 19:57, Tom Clarkson <tqclarkson@xxxxxxxxxx> wrote: > > > Overall I think your proposed algorithm is reasonable (even though I > > think it won't address some of the cases in our repo). Will your > > algorithm allow us to pass $dir to git rev-list, for the initial > > split? > > Is this just for performance reasons? As I understand it that was left out because it would exclude relevant commits on an existing subtree, but it could make sense as an optimization for the first split of a large repo. Yes, it's for performance reasons on a first split that I'd like to see it. On the FreeBSD repo the difference is some 40 minutes vs. a few seconds. > So the process becomes something like > > # clear the cache - shouldn't usually be necessary, but it's a universal debugging step. > git subtree clear-cache --prefix=dir > > # ref and all its parents are before subtree add. Treat any children as inital commits. > git subtree ignore --prefix=dir ref > > # ref and all its parents are known subtree commits to be included without transformation. > git subtree existing --prefix=dir ref > > # Override an arbitrary mapping, either for performance or because that commit is problematic > git subtree map --prefix=dir mainline-ref subtree-ref > > # Run the existing algorithm, but skipping anything defined manually > git subtree split --prefix=dir This sounds about perfect. > > For a concrete example (from the repo at > > https://github.com/freebsd/freebsd), 7f3a50b3b9f8 is a mainline commit > > that added a new subtree, from 9ee787636908. I think that if I could > > inform subtree split that 9ee787636908 is the root it would work for > > me. > > Aside from the metadata, that one is a bit different from a standard subtree add in that it copies three folders from the subtree repo rather than the root - so the contents of contrib/elftoolchain will never exactly match the actual elftoolchain repo, and 9ee787636908 is neither mainline nor subtree as subtree split understands it. Fair enough, and we have lots of examples of slightly strange history in svn that svn2git represents in interesting ways. > If you ignore 9ee787636908, the resulting subtree will be fairly clean, but won’t have much of a relationship to the external repo. > > If you treat 9ee787636908 as an existing subtree, the second commit on your subtree will be based on 7f3a50b3b9f8, which deletes most of the contents of the subtree. You should still be able to merge in updates from the external repo, but if you try to push changes upstream the deletion will break things. I think this is fine - our main goal here is to be able to update contrib/ code within FreeBSD as we do today with svn, and we may well always have some changes that are never intended to be pushed upstream. Continuing the example from our repo, there is more history in the "subtree" already, with 061ef1f9424f as the head. ca8624403626 is the merge to mainline.