Junio C Hamano <junkio@xxxxxxx> wrote: > The detection of corresponding subtree is done by comparing the > pathnames and types in the toplevel of the tree. > > Heuristics galore! That's the git way ;-). I have some concerns about the match-tree heuristic you are using here. For example, it is very common for Java projects to have the same tree "shape". Just look at egit/jgit for an example, the three top level directories are: org.spearce.egit.core/ META-INF/ build.properties plugin.xml src/ org.spearce.egit.ui/ META-INF/ build.properties plugin.xml src/ org.spearce.jgit META-INF/ src/ If I were to treat the first two as subprojects this new subtree merge strategy might fail here as it could easily match to the wrong directory. What about a different approach? In a merge of commit#1 (parent project) and commit#2 (subroject)... We have the set of merge bases readily available. We just have to find out in each merge base where the files went from commit#2, then modify commit#2 to conform to that same shape. Really that isn't too different from a rename detection. In other words do something like the following: a) Scan the parents of the merge base B for a commit that is in commit#2's ancestory but not commit#1's ancestory, except by the merge commit B. Such a parent must be from the project that commit#2 is also from. For sake of explaining this, lets call this parent B^2. b) Perform a partial rename-diff between B^2 and B. The magic here is we need to discard any path in B that also appears in B^1 and B^2, and that has the same SHA-1 as in B^1, before we do the rename-diff. c) Find the most common prefix within the renamed files. d) Fit commit#2 to use that prefix, and merge. Here's a real example. In 67c75759 you merged git-gui.git. 67c75759^1 is from git.git, 67c75759^2 is from git-gui.git. The stock rename-diff: $ git diff-tree --abbrev -r -M --diff-filter=MRD 67c75759^2 67c75759 :100644 100644 c714d38... d99372a... M .gitignore :100755 100755 8fac8cb... 7a10b60... M GIT-VERSION-GEN :100644 100644 fd82d9d... 5d31e6d... M Makefile :100644 100644 b95a137... b95a137... R100 TODO git-gui/TODO :100755 100755 f5010dd... f5010dd... R100 git-gui.sh git-gui/git-gui.sh The problem here is both ^1 and ^2 defines the first three paths, so we think we modified them in the merge rather than moved them. But these three files match ^1, as we did not do an evil merge here. That's why they are showing as modified in this diff. Now take 67c7 and whack those three files (step b above), and rediff: $ C=$(git ls-tree 67c75759 | sed ' / .gitignore$/d / GIT-VERSION-GEN$/d / Makefile$/d' | git mktree) $ git diff-tree --abbrev -r -M --diff-filter=MRD 67c75759^2 $C :100644 100644 c714d38... c714d38... R100 .gitignore git-gui/.gitignore :100755 100755 8fac8cb... 8fac8cb... R100 GIT-VERSION-GEN git-gui/GIT-VERSION-GEN :100644 100644 fd82d9d... fd82d9d... R100 Makefile git-gui/Makefile :100644 100644 b95a137... b95a137... R100 TODO git-gui/TODO :100755 100755 f5010dd... f5010dd... R100 git-gui.sh git-gui/git-gui.sh Wow, look at that, everything starts with 'git-gui/'! ;-) Then we just need to pick the most popular common prefix of all renamed paths and fit commit#2 to conform to that structure. Finally we can run the merge through. The (now functional) pretend object stuff can be useful here, such as to make $C above so we can pass it off to diffcore. I think popping off the 'git-gui/' prefix would be the same deal, only we'd be looking at the old names to determine the prefix to pop, rather than the new names. We already do rename detection in merge-recursive. Slapping an extra rename pass in front of things when it is invoked as merge-subtree can't performance hurt that much. Thoughts? -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html