Re: [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 30 Jul 2018 10:45:15 -0700

Thomas Gummerer <t.gummerer@xxxxxxxxx> writes:

> Currently rerere can't handle nested conflicts and will error out when
> it encounters such conflicts.  Do that by recursively calling the
> 'handle_conflict' function to normalize the conflict.
>
> The conflict ID calculation here deserves some explanation:
>
> As we are using the same handle_conflict function, the nested conflict
> is normalized the same way as for non-nested conflicts, which means
> the ancestor in the diff3 case is stripped out, and the parts of the
> conflict are ordered alphabetically.
>
> The conflict ID is however is only calculated in the top level
> handle_conflict call, so it will include the markers that 'rerere'
> adds to the output.  e.g. say there's the following conflict:
>
>     <<<<<<< HEAD
>     1
>     =======
>     <<<<<<< HEAD
>     3
>     =======
>     2
>     >>>>>>> branch-2
>     >>>>>>> branch-3~

Hmph, I vaguely recall that I made inner merges to use the conflict
markers automatically lengthened (by two, if I recall correctly)
than its immediate outer merge.  Wouldn't the above look more like

     <<<<<<< HEAD
     1
     =======
     <<<<<<<<< HEAD
     3
     =========
     2
     >>>>>>>>> branch-2
     >>>>>>> branch-3~

Perhaps I am not recalling it correctly.

> it would be recorde as follows in the preimage:
>
>     <<<<<<<
>     1
>     =======
>     <<<<<<<
>     2
>     =======
>     3
>     >>>>>>>
>     >>>>>>>
>
> and the conflict ID would be calculated as
>
>     sha1(1<NUL><<<<<<<
>     2
>     =======
>     3
>     >>>>>>><NUL>)
>
> Stripping out vs. leaving the conflict markers in place in the inner
> conflict should have no practical impact, but it simplifies the
> implementation.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@xxxxxxxxx>
> ---
>  Documentation/technical/rerere.txt | 42 ++++++++++++++++++++++++++++++
>  rerere.c                           | 10 +++++--
>  t/t4200-rerere.sh                  | 37 ++++++++++++++++++++++++++
>  3 files changed, 87 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
> index 4102cce7aa..60d48dc4fe 100644
> --- a/Documentation/technical/rerere.txt
> +++ b/Documentation/technical/rerere.txt
> @@ -138,3 +138,45 @@ SHA1('B<NUL>C<NUL>').
>  If there are multiple conflicts in one file, the sha1 is calculated
>  the same way with all hunks appended to each other, in the order in
>  which they appear in the file, separated by a <NUL> character.
> +
> +Nested conflicts
> +~~~~~~~~~~~~~~~~
> +
> +Nested conflicts are handled very similarly to "simple" conflicts.
> +Similar to simple conflicts, the conflict is first normalized by
> +stripping the labels from conflict markers, stripping the diff3
> +output, and the sorting the conflict hunks, both for the outer and the
> +inner conflict.  This is done recursively, so any number of nested
> +conflicts can be handled.
> +
> +The only difference is in how the conflict ID is calculated.  For the
> +inner conflict, the conflict markers themselves are not stripped out
> +before calculating the sha1.
> +
> +Say we have the following conflict for example:
> +
> +    <<<<<<< HEAD
> +    1
> +    =======
> +    <<<<<<< HEAD
> +    3
> +    =======
> +    2
> +    >>>>>>> branch-2
> +    >>>>>>> branch-3~
> +
> +After stripping out the labels of the conflict markers, and sorting
> +the hunks, the conflict would look as follows:
> +
> +    <<<<<<<
> +    1
> +    =======
> +    <<<<<<<
> +    2
> +    =======
> +    3
> +    >>>>>>>
> +    >>>>>>>
> +
> +and finally the conflict ID would be calculated as:
> +`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`
> diff --git a/rerere.c b/rerere.c
> index a35b88916c..f78bef80b1 100644
> --- a/rerere.c
> +++ b/rerere.c
> @@ -365,12 +365,18 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io,
>  		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
>  	} hunk = RR_SIDE_1;
>  	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
> -	struct strbuf buf = STRBUF_INIT;
> +	struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
>  	int has_conflicts = -1;
>  
>  	while (!io->getline(&buf, io)) {
>  		if (is_cmarker(buf.buf, '<', marker_size)) {
> -			break;
> +			if (handle_conflict(&conflict, io, marker_size, NULL) < 0)
> +				break;
> +			if (hunk == RR_SIDE_1)
> +				strbuf_addbuf(&one, &conflict);
> +			else
> +				strbuf_addbuf(&two, &conflict);

Hmph, do we ever see the inner conflict block while we are skipping
and ignoring the common ancestor version, or it is impossible that
we see '<' only while processing either our or their side?

> +			strbuf_release(&conflict);
>  		} else if (is_cmarker(buf.buf, '|', marker_size)) {
>  			if (hunk != RR_SIDE_1)
>  				break;
> diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
> index 34f0518a5e..d63fe2b33b 100755
> --- a/t/t4200-rerere.sh
> +++ b/t/t4200-rerere.sh
> @@ -602,4 +602,41 @@ test_expect_success 'rerere with unexpected conflict markers does not crash' '
>  	git rerere clear
>  '
>  
> +test_expect_success 'rerere with inner conflict markers' '
> +	git reset --hard &&
> +
> +	git checkout -b A master &&
> +	echo "bar" >test &&
> +	git add test &&
> +	git commit -q -m two &&
> +	echo "baz" >test &&
> +	git add test &&
> +	git commit -q -m three &&
> +
> +	git reset --hard &&
> +	git checkout -b B master &&
> +	echo "foo" >test &&
> +	git add test &&
> +	git commit -q -a -m one &&
> +
> +	test_must_fail git merge A~ &&
> +	git add test &&
> +	git commit -q -m "will solve conflicts later" &&
> +	test_must_fail git merge A &&
> +
> +	echo "resolved" >test &&
> +	git add test &&
> +	git commit -q -m "solved conflict" &&
> +
> +	echo "resolved" >expect &&
> +
> +	git reset --hard HEAD~~ &&
> +	test_must_fail git merge A~ &&
> +	git add test &&
> +	git commit -q -m "will solve conflicts later" &&
> +	test_must_fail git merge A &&
> +	cat test >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done