Constantine Plotnikov <constantine.plotnikov@xxxxxxxxx> writes: > Somewhere between the git 1.7.0.2 and the git 1.7.2.0 the rename > detection started to fail with fatal error on some files in our > repository. The bug could be seen on the public IntelliJ IDEA > repository (about 760M in size), but our users have reported it as > well. > > To reproduce the error, run the following sequence of the commands: > > git clone git://git.jetbrains.org/idea/community.git idea > cd idea > git log -M --follow --name-only -- > platform/lang-api/src/com/intellij/lang/documentation/CompositeDocumentationProvider.java > > As result "fatal: internal error in diff-resolve-rename-copy" is > written on stderr. This is somewhat unexpected result. Git 1.7.0.2 and > 1.6.5.2 seems to work without visible problems. This is interesting. I actually see another potential "funny" (and that is why Linus is CC'ed --- scroll down to "funny"), which is unrelated. diff_tree_sha1() essentially: - given two trees, runs straightforward diff-tree without any rename detection; - under --follow mode, if it has only one change that deletes a path, runs try_to_follow_renames(), which: - runs the same diff-tree with deep rename detection but without any path limiter; - if we find a rename or copy that explains why we saw the deletion in the first step, replace the deletion record with the rename or copy. also set up to follow the old name from now on. - then return to the caller. The diff frontends (diff-tree, diff-files and diff-index) are expected to leave vanilla filepairs and let the diffcore backend to find renames and copies via a call to diff_resolve_rename_copy() in diffcore_std(). We however have a special "hack" in "diff-tree --follow". If it finds only one diff_queue entry that creates a path, it internally runs itself again with the rename/copy detection on, without any pathspec, to see if the creation is an artifact of the pathspec (iow, if there is a source of rename/copy into the created path), and replaces that creation record with a rename record in the diff_queue. When this is done, we do not want the regular resolve_rename_copy() to kick in (i.e. the "follow" hack already made a pair). But what 1da6175 (Make diffcore_std only can run once before a diff_flush, 2010-05-06) did is clearly wrong. Not wanting to call resolve-rename-copy does not mean we do not want to run the rest of what diffcore_std() does at all! For example, "-S" and "--diff-filter=" options are processed in that function; the exit status of the command based on the presense of difference is computed in the function, too. Another potential "funny" is this (unrelated to the reported issue). The "--follow" logic is called from diff_tree_sha1() function, but the input trees to diff_tree_sha1() are not necessarily the top-level trees (compare_tree_entry() calls it while it recursively descends into subtrees). For example, with the example Constantine gave us, the first "try-to-follow-renames" call happens with the "base" set to "platform/" but the rename source is actually "lang-api/src/com/intellij/..." hierarchy, so it is a wasted call. I think we only want to run the rename following at the very top level, i.e. like the attached patch. Linus, what do you think? Am I missing something obvious? diff --git a/tree-diff.c b/tree-diff.c index 1fb3e94..5b68c08 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -412,7 +412,7 @@ int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const cha init_tree_desc(&t1, tree1, size1); init_tree_desc(&t2, tree2, size2); retval = diff_tree(&t1, &t2, base, opt); - if (DIFF_OPT_TST(opt, FOLLOW_RENAMES) && diff_might_be_rename()) { + if (!*base && DIFF_OPT_TST(opt, FOLLOW_RENAMES) && diff_might_be_rename()) { init_tree_desc(&t1, tree1, size1); init_tree_desc(&t2, tree2, size2); try_to_follow_renames(&t1, &t2, base, opt); -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html