On Sun, Apr 15, 2018 at 5:37 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Elijah Newren <newren@xxxxxxxxx> writes: > >> @@ -362,13 +363,17 @@ static int git_merge_trees(struct merge_options *o, >> init_tree_desc_from_tree(t+2, merge); >> >> rc = unpack_trees(3, t, &o->unpack_opts); >> + cache_tree_free(&active_cache_tree); >> + >> + o->orig_index = the_index; >> + the_index = tmp_index; >> + >> /* >> - * unpack_trees NULLifies src_index, but it's used in verify_uptodate, >> - * so set to the new index which will usually have modification >> - * timestamp info copied over. >> + * src_index is used in verify_uptodate, but was NULLified in >> + * unpack_trees, so we need to set it back to the original index. >> */ > > Was NULLified? I thought that the point of src/dst distinction > Linus introduced long time ago at 34110cd4 ("Make 'unpack_trees()' > have a separate source and destination index", 2008-03-06) was that > we can then keep the source side of the traversal unmodified. That comment is messed up; maybe I edited and re-edited the comment multiple times and then didn't notice the big problems when re-reading? Anyway, I should move the comment a few lines up, and make the code instead read: /* * Update the_index to match the new results, AFTER saving a copy * in o->orig_index. Update src_index to point to the saved copy. * (verify_uptodate() checks src_index, and the original index is * the one that had the necessary modification timestamps.) */ o->orig_index = the_index; the_index = tmp_index; o->unpack_opts.src_index = &o->orig_index; >> static int would_lose_untracked(const char *path) >> { >> - return !was_tracked(path) && file_exists(path); >> + /* >> + * This may look like it can be simplified to: >> + * return !was_tracked(o, path) && file_exists(path) >> + * but it can't. This function needs to know whether path was >> + * in the working tree due to EITHER having been tracked in the >> + * index before the merge OR having been put into the working copy >> + * and index by unpack_trees(). Due to that either-or requirement, >> + * we check the current index instead of the original one. >> + */ > > If this path was created by merge-recursive, not by unpack_trees(), > what does this function want to say? Say, we are looking at path P, > the other branch we are merging moved some other path Q to P (while > our side modified contents at path Q). Then path P we are looking > at has contents of Q at the merge base at stage #1, the contents of > Q from our HEAD at stage #2 and the contents of P from the other > branch at stage #3. The code below says "path P is OK, we won't > lose it" in such a case, but it is unclear if the above comment > wants to also cover that case. Oh, boy, here be dragons... The comment as-is actually does cover your example case with Q and P: unpack_trees(), which is unaware of renames, will see that P only exists on one side of history and thus load it into the index at stage 0 rather than stage 3. But your general comment about whether something else in merge-recursive could create a path in the current index after unpack_trees() is interesting...it touches on a pitfall that has bit me multiple times. There is a required ordering in merge-recursive.c that for any given path, the working directory must be updated before the index is -- otherwise, would_lose_untracked() will return faulty information. update_file_flags() has this ordering builtin, update_stages() has a big obnoxious comment at the beginning about how it should not be called until after update_file() is, apply_directory_rename_modifications() has a big comment about this ~80% of the way through the function (look for "would_lose_untracked"), and conflict_rename_rename_2to1() has a big obnoxious comment near the end painstakingly pointing out that some code that feels like it would make more sense being combined with a previous function cannot be due to the update-working-directory-before-index requirement. I should probably add to this comment something about this annoying (and error-prone) ordering restriction, since this function is the source of those particular pains. Your suggested ideal-world rewrite (run unpack_trees() with unpack_opts.index_only=1, do merge in memory, then update working tree), would make this whole problem go away.