Re: git diff word diff bug??

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20-Apr-2021, at 22:08, Count of San Francisco <countofsanfrancisco@xxxxxxxxx> wrote:
> 
> Hi All,
> 
> Here is my "git bugreport":
> 
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
> 
> What did you do before the bug happened? (Steps to reproduce your issue)
>   git diff --word-diff=porcelain file0.txt file1.txt
>     or
>   git diff --word-diff file0.txt file1.txt
> 
> What did you expect to happen? (Expected behavior)
> 
>   I expected the diff for porcelain or default word-diff to be clear on which lines got removed and which changes belong to which line. I explain more in details below.
> 
> What happened instead? (Actual behavior)
> 
>   The diff was not clear.
> 
> What's different between what you expected and what actually happened?
> 
>   The diff made it looked like all the removed text were on one line and a later change in a line look like it was meant for a different line. When in fact, the later changes were for the same line (i.e. the first line). More details below.
> 
> Anything else you want to add:
> 
> Here are the details to reproduce and more details on how I interpreted the diff. If I am writing a script to highlight changes or to do extra processing for my specific use case, my script would get confused as to what really changed.
> 
> file0.txt content:
> *** Begin Content *** --> this line is not in the actual file but just a marker here for clarity.
> The fox jumped over the wall.
> Blah1e32
> q432423
> qe23234
>  233
> 253
> 345235
> 
> 53243
> afsfffas
> *** End Content ****
> 
> file1.txt content:
> *** Begin Content ***
> The fox jumped over the river.
>   He made it over.
> *** End Content ****
> 
> git diff --word-diff file0.txt file1.txt produced this:
> diff --git a/file0.txt b/file1.txt
> index c8756ba..3413f10 100644
> --- a/file0.txt
> +++ b/file1.txt
> @@ -1,11 +1,2 @@
> The fox jumped over the [-wall.-]
> [-Blah1e32-]
> [-q432423-]
> [-qe23234-]
> [- 233-]
> [-253-]
> [-345235-]
> 
> [-53243-]
> [-afsfffas-]{+river.+}
> {+  He made it over.+}

>From my experience, git diff prefers to bundle up a series of
deletions or additions into a group if they all have the same
word delimiter. The way I would interpret this diff is the steps
needed to be taken when moving left to right in file0 to get to
the state of file1, while minimising the number of times file1
has to be consulted to know what needs to be done next.

Here it would be:
"Delete all the words from 'wall' upto 'afsfffas', and then add
'river.' and ' He made it over'".

> The diff above does not make it clear that the "{+river+}" is really to be appended (or related) to the first line.
> I expected the first diff line to look like this:
> The fox jumped over the [-wall.-]{+river+} and the rest of the lines are delete lines.
> 
> git diff --word-diff=porcelain file0.txt file1.txt produced this:
> diff --git a/file0.txt b/file1.txt
> index c8756ba..3413f10 100644
> --- a/file0.txt
> +++ b/file1.txt
> @@ -1,11 +1,2 @@
>  The fox jumped over the
> -wall.
> ~
> -Blah1e32
> ~
> -q432423
> ~
> -qe23234
> ~
> - 233
> ~
> -253
> ~
> -345235
> ~
> ~
> -53243
> ~
> -afsfffas
> +river.
> ~
> +  He made it over.
> ~
> 
> This is more non-discernable. The git diff --help documentation says that "Newlines in the input are represented by a tilde ~ on a line of its own". So a script would see the '~' character and interpret that as a new line. The script would have mistaken the "+river" for a different line. The git diff --help documentation does not explain what to do in this scenario.
> 
> I expected this:
>  The fox jumped over the
> -wall.
> +river.
> ~

This is also consistent with the behaviour I mentioned above.
A script would need to interpret this as:
delete "wall"        (this starts the streak of deletions)
go to next line
delete "Blah1e32"
...

and as soon as it sees a '+', that is, an addition, it knows
the series of deletions are done with, so it will add "river"
to the last line that was common to both, that is,
"the fox jumped over the".

> Is this a bug? If not, how do I make the distinction that the {+river+} (in the first case) and the +river (in the 2nd case) is really for the first line?

I do not think this is a bug, because it does not really
deviate from any specified behaviour. But I do see the source
of confusion.

I hope I could explain that well enough.

> Please review the rest of the bug report below.
> You can delete any lines you don't wish to share.
> 
> 
> [System Info]
> git version:
> git version 2.30.0
> cpu: x86_64
> no commit associated with this build
> sizeof-long: 8
> sizeof-size_t: 8
> shell-path: /bin/sh
> uname: Darwin 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64 x86_64
> compiler info: clang: 12.0.0 (clang-1200.0.32.28)
> libc info: no libc information available
> $SHELL (typically, interactive shell): /usr/local/bin/bash
> 
> 
> [Enabled Hooks]
> not run from a git repository - no hooks to show
> 

--
Atharva Raykar





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux