On Tue, 17 Dec 2019 at 19:17, Tom Clarkson <tqclarkson@xxxxxxxxxx> wrote: > > The algorithm I am looking at to replace the file based mainline detection is > > - If subtree root is unknown (as on the initial split), everything is mainline. > > - If subtree root is reachable and mainline root is not, it’s a subtree commit > > - Otherwise, treat as mainline. This will also pick up commits from other subtrees but they hopefully won’t contain the subtree folder. I don’t think there is an unambiguous way to distinguish a subtree merge from a regular merge - the message produced is pretty generic. It may be possible to check reachability of all known subtrees, but that adds a fair bit of complexity. > > That leaves us with the question of how to record the empty mainline commits. The most correct result for your repro is probably four commits (add/delete everything/restore/modify), but I can see that falling over in a scenario where deleting a subtree is more like unlinking a library than editing that library to do nothing. > > Is it sufficiently correct for your scenario to treat ‘restore file1’ as the initial subtree commit? My reproduction scenario is really a demonstration of the real issue I encountered. Running the initial "subtree split" on the real repo takes about 40 minutes so I wanted something trivial that shows the same issue. In the demonstration case (i.e., actually removing and readding the subtree) I think it's reasonable to start with the commit that added it back. Overall I think your proposed algorithm is reasonable (even though I think it won't address some of the cases in our repo). Will your algorithm allow us to pass $dir to git rev-list, for the initial split? My actual issue stems from the way svn2git converted some odd svn history, and is described in more detail on the freebsd-git mailing list at https://lists.freebsd.org/pipermail/freebsd-git/2019-November/000218.html. Perhaps we can have some command-line options to provide metadata for cases that cannot be inferred? The cases in our repo come from svn2git creating subtree merges to represent updates from vendor code. AFAIK these should be basically identical to what subtree creates, except that we don't have any of the metadata it adds. For a concrete example (from the repo at https://github.com/freebsd/freebsd), 7f3a50b3b9f8 is a mainline commit that added a new subtree, from 9ee787636908. I think that if I could inform subtree split that 9ee787636908 is the root it would work for me.