Jeff King <peff@xxxxxxxx> writes: >> 1) Is this a known limitation or is there a reason rerere works in >> this manner? > > Yes, it's known. Rerere works by storing a mapping of conflicted hunks > to their resolutions. If there's no conflicted hunk, I'm not sure how > we'd decide what to feed into the mapping to see if there is some > content to be replaced. Correct. This is not even a limitation but is outside the scope of rerere. Let's understand that first. A semantic conflict that requires an evil merge that touches a file that is not involved in any textual conflict during a merge will happen even if there is *NO* textual merge conflict. Imagine that there is a global variable 'xyzzy' used in many places in the code, and then a side branch forks from the mainline. The mainline renames the variable to 'frotz' in the entire codebase, while the side branch adds one more place that the variable is used under its original name. Then you merge these two branches. This will textually merge cleanly if the place the side branch adds a new mention of 'xyzzy' is textually far from any block of text in the common ancestor that has been updated on the mainline while these two branches diverged. "git checkout mainline && git merge side" will cleanly automerge, yet the result is not correct. The new mention of 'xyzzy' added by the merge needs to be corrected to 'frotz'. Now we take that as the baseline and further imagine that during the time these two branches diverged, the mainline also updated some documentation of something totally unrelated to 'xyzzy' vs 'frotz' variable. Perhaps README was updated, or something. The side branch also updated the same file in a different way. This time, the changes to this same file may result in textual conflict. "git checkout mainline && git merge side" will result in a conflict, whose resolution may be recorded by rerere for that file. It should be crystal clear that this conflict does *not* have anything to do with the semantic conflict between 'xyzzy' vs 'frotz'. The realization we must draw from the above observation is that what the "merge-fix" machinery in the Reintegrate script you cited in your message tries to help, which is the semantic conflict, fundamentally cannot be tied to any textual merge conflict that may (or may not) happen. That is what makes the issue outside the scope of rerere. The above is not to say that the need to record and replay such evil merges to solve semantic conflict does not exist. Far from it. It is just clarifying that it is a wrong approach to try to "teach" rerere to somehow handle that case as well. If we wanted to port the "merge-fix" logic, and I do wish it would happen some day, the update belongs to "git merge". You were too kind to call the "merge-fix" logic in Reintegrate "the state of the art", but I am not happy about its limitation. Here are the things I wish to have in an ideal version of the "merge-fix" logic, which does not exist yet: * There is a database of "to be cherry-picked" commits, keyed by a pair of branch names. That is, given branches A and B, the database will return 0 or more commits that can be cherry-picked. The order of branch names in the pair is immaterial, i.e. asking the database for cherry-pickable commits keyed by <A, B> and keyed by <B, A> will yield the identical set of commits. * When merging commit X to commit Y, "git merge" in the ideal world does the following: - It first does what it currently does, i.e. three-way merge with the merge strategy and applying rerere for re-application of an earlier resolution of textual conflicts. - Then, it looks up the database to find the keys <A, B> where A is in X but not in Y, and B is not in X but in Y. These commits are cherry-picked and squashed into the result of the above. The intent is that a pair <A, B> represents the mainline and side branch in the above example, where A renamed 'xyzzy' to 'frotz' and B added new reference to 'xyzzy'. And the cherry-pickable commit found in the database is to tweak the 'xyzzy' B adds into 'frotz'. I said A and B in the above are branch names, but in the ideal world, they can be commit object names (possibly in the middle of a branch), as long as we can reliable update the database's keys every time "git rebase" etc. rewrites commits. To populate the database, we'd need a reverse. * When merging branch B into branch A (or the other way around) for the first time, "git merge" would do what it currently does. * The user then concludes the merge to resolve *ONLY* the textual conflict, and commit the result. It is important that no additional evil merge to correct for semantic conflicts is done in this step. Note that if the auto-merge cleanly succeeds, this step may not even exist. * Then the user makes another commit to correct for semantic conflicts (aka "merge-fix"). * Then the user tells Git that semantic conflicts were resolved and need to be recorded (just like running "git rerere" manually, before "git commit" automatically does it for them these days). This will result in the following: - The database is updated so that key <A, B> yields the "merge-fix" commit; - The head is detached at the tip of branch A before the merge; - "git merge B" is done again, which _should_ reproduce the state immediately after the user committed the "merge-fix"; - The tip of branch A is reset to the result of the above. The merge-fix logic in Reintegrate is a poor-man's emulation of the above ideal. A value its database yields is not a set of commits, but a single commit, and instead of getting keyed by a pair of branch names, the database is keyed by a single branch name (i.e. recording "I had trouble when merging this branch" without saying "... to the integration branch that already had this other branch"), so the look-up does not have to do "A is in X but not in Y, and B is not in X but in Y". It is still usable but the database need to be reorganized every time the order of topics merged to 'pu' is changed.