Re: How to selectively recreate merge state?

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 11 Dec 2009 11:24:24 -0800

Jakub Narebski <jnareb@xxxxxxxxx> writes:

> I have thought that if there exist stage #0 in index, git simply _ignores_
> higher stages, so git-add simply adds stage #0 and does not delete higher
> stages.

Then you thought wrong ;-).

Leaving resolved cruft in the main index (aka active_cache[]) will make
all the normal operation codepath unnecessarily complex.  They rely on "if
I see stage #0, there is no higher stages for the same path".  And extra
checks will slow things down.

But that does not necessarily mean the index is a wrong place to save away
pre-resolution information on resolved paths (read on).

Before suggesting a possible next move, there are a few things we should
notice while reading ec16779 (Add git-unresolve <paths>..., 2006-04-19):

 - This was done about only one year after git was born.  You should not
   take it granted that the workflow it wanted to support makes sense.

   Considering that using "git add" to mark the resolution is to declare
   that you are _finished_ with that path, using it for other purposes
   (e.g. leaving a note that says "I've looked at and have one possible
   resolution in the file in the work tree, but I haven't verified the
   result yet", which is what the commit talks about) is simply an
   (ab|mis)use of the index.  Lossage of higher stage information by this
   misuse is user's problem, and there is this thing called pen & pencil
   the user can use for taking notes if s/he does not want to lose the
   original conflict information from the index.

 - Even if we for a moment consider that the workflow made some sense, the
   particular implementation is not suitable anymore for today's git.

   Again, this was done only one year after git was born, and back then
   "pull/merge" were the only things that left conflicts in every day
   operations by end users, and not many people didn't expect git to merge
   across renames.  It was sufficient to read the path the end user asked
   for from HEAD and MERGE_HEAD and pretend we "unresolved" in such a
   simpler world.

   But "merge" is not the primary thing that gives you conflicts anymore.
   "rebase", "cherry-pick", "stash apply" are much more widely used by
   ordinary users these days than back then, and reading from MERGE_HEAD
   wouldn't do any good for recreating what these operations did.  Even
   with "merge", stages #2 and #3 can come from a totally different path
   when using recursive and subtree strategies, so reading from
   HEAD/MERGE_HEAD is not as useful as it used to be.

In fact, considering that there are many ways conflicts can be left in the
index and there are only two ways that they are resolved in the index by
the user (and both eventually uses a single function to do so), it would
make perfect sense to do the following:

 - Define a new index extension section to record "unresolve"
   information.

 - Every time add_index_entry_with_check() in read-cache.c records a stage
   0 entry while dropping higher stage entries for the same path, record
   these higher stage entries to the "unresolve" section.

 - An "update-index --unresolve" will use the information from this
   "unresolve" extension to recreate the unmerged state.

 - "rerere forget" that we earlier talked about in a separate thread will
   use exactly the same mechanism to get back the unmerged state to
   recompute the conflict identifier (this is why J6t is addded to the Cc:
   list).

 - "checkout --conflict" _might_ want to also consider unresolving the
   path first using this information, if it finds the path user asked to
   re-checkout with conflict markers has already been resolved.

It is important to think through to decide when we purge the "unresolve"
section.

If you run "read-tree", "checkout" to switch branches, or "reset" (any
option other than "--soft" which does not even touch the index), it is a
good sign that the information in the "unresolve" extension section is no
longer needed, so you can drop the section in these operations.

Optionally, write_index() could notice if there is no unmerged entries and
the cache_tree is fully valid---that is an indication that a tree object
has been written out of the now resolved index, and may (or may not) imply
that the "unresolve" information is no longer needed.  But I haven't
thought this last one through.  You could wish to unresolve even after you
committed your merge (you _could_ wish for anything after all), but I do
not yet know if granting that wish makes much sense.

There may be other cases we _must_ drop "unresolve".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html