Re: [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Elijah

On 09/02/2023 09:09, Elijah Newren wrote:
Hi John and Phillip,

On Tue, Feb 7, 2023 at 9:05 AM John Cai <johncai86@xxxxxxxxx> wrote:

[...]
Perhaps I'm over simplifying but having read the issue you linked to I couldn't help feeling that the majority of users might be satisfied by just changing gitlab to use the patience algorithm when generating diffs.

Right, I recognize this is a judgment call that may be best left up to the list.

We don't have a way in GitLab to change the diff algorithm currently. Of course
that can be implemented outside of Git,

Well, the below doesn't allow users to make diffs better for
*individual* files of interest, but if you agree with me that we
should just make diffs better for all users automatically, it's a
two-line change in git.git that I'd love to eventually convince the
project to take (though obviously doing that would also require some
documentation changes and some good messaging in release notes and
whatnot).  I've used it for a good long while, and had a few dozen
users using this patch too, all without complaint:

I'd support a change to either patience or histogram as the default algorithm. My personal preference would be for the patience algorithm as I think it generally gives nicer diffs in the cases that the two disagree (see below, I've tried changing diff.algorithm to histogram a few times and I always end up changing it back to patience pretty quickly). However I can see there is an advantage in having "diff" and "merge" use the same algorithm as users who diffing either side to the merge base will see the same diff that the merge is using. The histogram algorithm is known to produce sub-optimal diffs in certain cases[1] but I'm not sure how much worse it is in that respect than any of the other algorithms.

To see the differences between the output of patience and histogram algorithms I diffed the output of "git log -p --no-merges --diff-algorithm=patience" and "git log -p --no-merges --diff-algorithm=histogram". The first three differences are

- 6c065f72b8 (http: support CURLOPT_PROTOCOLS_STR, 2023-01-16)
  In get_curl_allowed_protocols() the patience algorithm shows the
  change in the return statement more clearly

- 47cfc9bd7d (attr: add flag `--source` to work with tree-ish, 2023-01-14)
   The histogram algorithm shows read_attr_from_index() being moved
   whereas the patience algorithm does not making the diff easier to
   follow.

- b0226007f0 (fsmonitor: eliminate call to deprecated FSEventStream function, 2022-12-14)
  In fsm_listen__stop_async() the histogram algorithm shows
  data->shutdown_style = SHUTDOWN_EVENT;
  being moved, which is not as clear as the patience output which
  shows it as a context line.

I think there is a degree of personal preference when it comes to which out of patience or histogram is best and the user can easily select their preferred algorithm so I'd be happy with either.

Best Wishes

Phillip

[1] https://lore.kernel.org/git/CAGZ79kZYO6hHiAM8Sfp3J=VX11c=0-7YDSx3_EAKt5-uvvt-Ew@xxxxxxxxxxxxxx/

diff --git a/diff.c b/diff.c
index 329eebf16a..77a46d5b7d 100644
--- a/diff.c
+++ b/diff.c
@@ -55,7 +55,7 @@ static int diff_relative;
  static int diff_stat_graph_width;
  static int diff_dirstat_permille_default = 30;
  static struct diff_options default_diff_options;
-static long diff_algorithm;
+static long diff_algorithm = XDF_HISTOGRAM_DIFF;
  static unsigned ws_error_highlight_default = WSEH_NEW;

  static char diff_colors[][COLOR_MAXLEN] = {
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index b298f220e0..2f663eab72 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1549,7 +1549,7 @@ test_expect_success 'short lines of opposite
sign do not get marked as moved' '
         this line should be marked as oldMoved newMoved
         unchanged 4
         EOF
-       test_expect_code 1 git diff --no-index --color --color-moved=zebra \
+       test_expect_code 1 git diff --diff-algorithm=myers --no-index
--color --color-moved=zebra \
                 old.txt new.txt >output && cat output &&
         grep -v index output | test_decode_color >actual &&
         cat >expect <<-\EOF &&


I used histogram above rather than patience, since (a) it's what git's
merge backend uses, (b) it produces roughly similar results to
patience from a user perspective, (c) past testing has shown it to be
somewhat faster than patience, and (d) we've potentially got some
leads in how to speed up our histogram implementation from the README
over at https://github.com/pascalkuthe/imara-diff.  But, if you really
wanted to use patience as the default, it'd also be an easy tweak.

Anyway, just some food for thought.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux