> On 23 Nov 2019, at 3:55 am, Ed Maste <emaste@xxxxxxxxxxx> wrote: > > I encountered an issue while trying to use git subtree with the > FreeBSD svn->git mirror: I found that when "git subtree split" > encounters a commit with an empty "git ls-tree" for the subdirectory > being split, it ends up recording the original parent as the new > parent in the split history that's being created. This then leads to > unrelated history appearing in the split subtree. > > Below is a shell script that demonstrates the issue - this is not the > precise case that I encountered in the FreeBSD repo, but the behaviour > is identical (and it doesn't take nearly 10 minutes to run). Running > the script and then "git log" of the commit printed by the final (git > subtree) command includes the unrelated history in dir2/. > > It looks like this comes from the cache_set "$rev" "$rev" in > process_split_commit() added in 39f5fff0d53. This is under the > suspicious-looking "ugly. is there no better way to tell if this is a > subtree vs. a mainline commit? Does it matter" comment. However, I > don't yet understand enough of git-subtree's operation to propose a > fix. > > --repro.sh-- > #!/bin/sh > > rm -rf subrepo-issue > mkdir -p subrepo-issue > cd subrepo-issue > > git init . > mkdir -p dir1 dir2 > touch dir1/file1 dir2/file2 > git add dir1 dir2 > git commit -m 'initial commit' > echo 'file2' > dir2/file2 > git commit -m 'file2 modified' dir2/file2 > git rm dir1/file1 > git commit -m 'remove file1' > mkdir -p dir1 > touch dir1/file1 > git add dir1 > git commit -m 'restore file1' > echo 'file1' > dir1/file1 > git commit -m 'file1 modified' dir1/file1 > git subtree split --prefix=dir1/ > The algorithm I am looking at to replace the file based mainline detection is - If subtree root is unknown (as on the initial split), everything is mainline. - If subtree root is reachable and mainline root is not, it’s a subtree commit - Otherwise, treat as mainline. This will also pick up commits from other subtrees but they hopefully won’t contain the subtree folder. I don’t think there is an unambiguous way to distinguish a subtree merge from a regular merge - the message produced is pretty generic. It may be possible to check reachability of all known subtrees, but that adds a fair bit of complexity. That leaves us with the question of how to record the empty mainline commits. The most correct result for your repro is probably four commits (add/delete everything/restore/modify), but I can see that falling over in a scenario where deleting a subtree is more like unlinking a library than editing that library to do nothing. Is it sufficiently correct for your scenario to treat ‘restore file1’ as the initial subtree commit?