Re: Git diff misattributes the first word of a line to the previous line

Johannes Sixt <j6t@xxxxxxxx> · Thu, 13 Oct 2022 08:45:55 +0200

Am 13.10.22 um 07:51 schrieb Gurjeet Singh:
> Git diff seems to get confused about word boundaries, and includes the
> first word from the next line.

No, that would misattribute the perceived malfunction.

> It seems that the first word of a line gets attributed to the previous
> line, ignoring the fact that there's an intervening newline before the
> word.
> [...]
> $ git diff --word-diff=plain /tmp/1.txt /tmp/2.txt
> diff --git a/tmp/1.txt b/tmp/2.txt
> index 8239f93..099fb80 100644
> --- a/tmp/1.txt
> +++ b/tmp/2.txt
> @@ -1,2 +1,2 @@
>     x = yz [-ab-]{+opt1+}
> {+    ac+} = [-cd ef-]{+pq opt2+}
> 
> $ cat /tmp/1.txt
>     x = yz
>     ab = cd ef
> 
> $ cat /tmp/2.txt
>     x = yz opt1
>     ac = pq opt2

The reason for this is that the implementation of word-diff does not
treat newline characters in any special way. They are treated as
"whitespace" like any other character that is not captured by the
word-diff patterns. Whitespace characters following each word are
recorded, but are disregarded when the word-diff is computed. When the
text is reconstructed in the output, these recorded space characters are
printed only for unchanged and added words, but are not printed for
removed words (IIRC). Combine this with the fact that when there is a
change, i.e., a combination of removal and addition, then the removal is
printed before the addition, and you get the observed output.

I don't see an easy solution for this without completely rewriting the
implementation.

-- Hannes