On 05/24, Junio C Hamano wrote: > Thomas Gummerer <t.gummerer@xxxxxxxxx> writes: > > > +Conflict normalization > > +---------------------- > > + > > +To try and re-do a conflict resolution, even when different merge > > +strategies are used, 'rerere' computes a conflict ID for each > > +conflict in the file. > > + > > +This is done by discarding the common ancestor version in the > > +diff3-style, and re-ordering the two sides of the conflict, in > > +alphabetic order. > > s/discarding.*-style/normalising the conflicted section to 'merge' style/ > > The motivation behind the normalization should probably be given > upfront in the first paragraph. It is to ensure the recorded > resolutions can be looked up from the rerere database for > application, even when branches are merged in different order. I am > not sure what you meant by even when different merge stratagies are > used; I'd drop that if I were writing the paragraph. What I meant was when different conflict styles are used, and when the branches are merged in different orders. But merge strategies is obviously not a good word for that. Will rephrase this. > > +Using this technique a conflict that looks as follows when for example > > +'master' was merged into a topic branch: > > + > > + <<<<<<< HEAD > > + foo > > + ======= > > + bar > > + >>>>>>> master > > + > > +and the opposite way when the topic branch is merged into 'master': > > + > > + <<<<<<< HEAD > > + bar > > + ======= > > + foo > > + >>>>>>> topic > > + > > +can be recognized as the same conflict, and can automatically be > > +re-resolved by 'rerere', as the SHA-1 sum of the two conflicts would > > +be calculated from 'bar<NUL>foo<NUL>' in both cases. > > You earlier talked about normalizing and reordering, but did not > talk about "concatenate both with NUL in between and hash", so the > explanation in the last two lines are not quite understandable by > mere mortals, even though I know which part of the code you are > referring to. When you talk about hasing, you may want to make sure > the readers understand that the branch label on <<< and >>> lines > are ignored. > > > +If there are multiple conflicts in one file, they are all appended to > > +one another, both in the 'preimage' file as well as in the conflict > > +ID. > > In case it was not clear (and I do not think it is to those who only > read your description and haven't thought things through > themselves), this concatenation is why the normalization by > reordering is helpful. Imagine that a common ancestor had a file > with a line with string "A" on it (I'll call such a line "line A" > for brevity in the following) in its early part, and line X in its > late part. And then you fork four branches that do these things: > > - AB: changes A to B > - AC: changes A to C > - XY: changes X to Y > - XZ: changes X to Z > > Now, forking a branch ABAC off of branch AB and then merging AC into > it, and forking a branch ACAB off of branch AC and then merging AB > into it, would yield the conflict in a different order. The former > would say "A became B or C, what now?" while the latter would say "A > became C or B, what now?" > > But the act of merging AC into ABAC and resolving the conflict to > leave line D means that you declare: > > After examining what branches AB and AC did, I believe that > making line A into line D is the best thing to do that is > compatible with what AB and AC wanted to do. > > So the conflict we would see when merging AB into ACAB should be > resolved the same way---it is the resolution that is in line with > that declaration. > > Imagine that similarly you had previously forked branch XYXZ from > XY, merged XZ into it, and resolved "X became Y or Z" into "X became > W". > > Now, if you forked a branch ABXY from AB and then merged XY, then > ABXY would have line B in its early part and line Y in its later > part. Such a merge would be quite clean. We can construct > 4 combinations using these four branches ((AB, AC) x (XY, XZ)). > > Merging ABXY and ACXZ would make "an early A became B or C, a late X > became Y or Z" conflict, while merging ACXY and ABXZ would make "an > early A became C or B, a late X became Y or Z". We can see there > are 4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X"). > > By sorting, we can give the conflict its canonical name, namely, "an > early part became B or C, a late part becames X or Y", and whenever > any of these four patterns appear, we can get to the same conflict > and resolution that we saw earlier. Without the sorting, we will > have to somehow find a previous resolution from combinatorial > explosion ;-) Thanks for the in depth explanation! I'll incorporate this into the document. > These days post ec34a8b1 ("Merge branch 'jc/rerere-multi'", > 2016-05-23), the conflict ID can safely collide, i.e. hash > collisions that drops completely different conflicts and their > resolutions into the same .git/rr-cache/$id directory will not > interfere with proper operation of the system, thanks to that > rerere-multi topic that allows us to store multiple preimage > conflicts that happens to share the same conflict ID with their > corresponding postimage resolutions. > > In theory, we *should* be able to stub out the SHA-1 computation and > give every conflict the same ID and rerere should still operate > correctly, even though I haven't tried it yet myself. I gave this a quick try, and the test suite seems to pass with the hash computation giving the same ID to all conflicts.