Re: [Bug?] Information around newlines lost in merge

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 20 Jun 2023 10:44:29 -0700

Karthik Nayak <karthik.188@xxxxxxxxx> writes:

> When merging two files which contain newlines at the end, the blob
> created (with conflicts) is the same as two files created without
> newlines at the end.
>
> If this is expected behavior, what would be the best way to
> differentiate the two? This is not a bug introduced, but rather the
> behavior since,
> the start, which makes me think that I'm missing something (verified
> via git bisect on latest git).

Strictly speaking, I suspect that the behaviour was different before
we introduced in-core 3-way merges of two blobs---back then we ran
the "merge" program (from the RCS suite).

If we start from an empty file and have two sides add different
incomplete lines (i.e. your "half" example, but without the leading
blank line), i.e.

	$ >O
	$ M="with a single line added by side %s (without terminating LF)"
	$ printf "$M" A >A
	$ printf "$M" B >B

The original "git merge" that used the external "merge" program
would have produced this:

	$ merge -p B O A 2>E
        <<<<<<< B
        with a single line added by side B (without terminating LF).=======
        with a single line added by side A (without terminating LF).>>>>>>> A
	$ cat E
	merge: warning: conflicts during merge

That is, the output would be a mess that cannot even be machine
parsed.  It probably has changed in a slightly improved way when we
switched to our own internal 3-way merge of two blobs, exposed as
the "git merge-file", which gives you:

        $ git merge-file -p A O B
        <<<<<<< A
        with a single line added by side A (without terminating LF).
        ||||||| O
        =======
        with a single line added by side B (without terminating LF).
        >>>>>>> B

And as you found out, if we added terminating LF to A and/or B, the
output would be the same.  You could argue that the result is at
least machine parseable, instead of the output that is more faithful
to the input (which we've seen above, in the output from "merge").

As "7 repeated marker characters followed by a random label string"
the merge machinery inserts cannot be relied on if you are building
a truly automated conflict resolver, lack of this one bit of
information each from both sides may be the least of your problems,
but what it means at the same time is that you _could_ propose an
augmented output format, perhaps like this one:

        $ git merge-file -p A O B
        <<<<<<< A
        with a single line added by side A (without terminating LF).
	\No newline at end of file
        ||||||| O
        =======
        with a single line added by side B (without terminating LF).
	\No newline at end of file
        >>>>>>> B

It has exactly the same problem we already have as these conflict
section separator lines in that lines that exactly would look like
these extra lines _could_ exist in the payload, so it is not
creating a new problem, but people may have built and are happy
enough with their incomplete automation that relies on the faulty
assumption that the merged payload would never contain lines that
are mistaken as conflict section separator lines, and such an
augmented output format _will_ be a breaking change to them.

So, I dunno.