Hi Elijah
On 09/02/2023 09:09, Elijah Newren wrote:
Hi John and Phillip,
On Tue, Feb 7, 2023 at 9:05 AM John Cai <johncai86@xxxxxxxxx> wrote:
[...]
Perhaps I'm over simplifying but having read the issue you linked to I couldn't help feeling that the majority of users might be satisfied by just changing gitlab to use the patience algorithm when generating diffs.
Right, I recognize this is a judgment call that may be best left up to the list.
We don't have a way in GitLab to change the diff algorithm currently. Of course
that can be implemented outside of Git,
Well, the below doesn't allow users to make diffs better for
*individual* files of interest, but if you agree with me that we
should just make diffs better for all users automatically, it's a
two-line change in git.git that I'd love to eventually convince the
project to take (though obviously doing that would also require some
documentation changes and some good messaging in release notes and
whatnot). I've used it for a good long while, and had a few dozen
users using this patch too, all without complaint:
I'd support a change to either patience or histogram as the default
algorithm. My personal preference would be for the patience algorithm as
I think it generally gives nicer diffs in the cases that the two
disagree (see below, I've tried changing diff.algorithm to histogram a
few times and I always end up changing it back to patience pretty
quickly). However I can see there is an advantage in having "diff" and
"merge" use the same algorithm as users who diffing either side to the
merge base will see the same diff that the merge is using. The histogram
algorithm is known to produce sub-optimal diffs in certain cases[1] but
I'm not sure how much worse it is in that respect than any of the other
algorithms.
To see the differences between the output of patience and histogram
algorithms I diffed the output of "git log -p --no-merges
--diff-algorithm=patience" and "git log -p --no-merges
--diff-algorithm=histogram". The first three differences are
- 6c065f72b8 (http: support CURLOPT_PROTOCOLS_STR, 2023-01-16)
In get_curl_allowed_protocols() the patience algorithm shows the
change in the return statement more clearly
- 47cfc9bd7d (attr: add flag `--source` to work with tree-ish, 2023-01-14)
The histogram algorithm shows read_attr_from_index() being moved
whereas the patience algorithm does not making the diff easier to
follow.
- b0226007f0 (fsmonitor: eliminate call to deprecated FSEventStream
function, 2022-12-14)
In fsm_listen__stop_async() the histogram algorithm shows
data->shutdown_style = SHUTDOWN_EVENT;
being moved, which is not as clear as the patience output which
shows it as a context line.
I think there is a degree of personal preference when it comes to which
out of patience or histogram is best and the user can easily select
their preferred algorithm so I'd be happy with either.
Best Wishes
Phillip
[1]
https://lore.kernel.org/git/CAGZ79kZYO6hHiAM8Sfp3J=VX11c=0-7YDSx3_EAKt5-uvvt-Ew@xxxxxxxxxxxxxx/
diff --git a/diff.c b/diff.c
index 329eebf16a..77a46d5b7d 100644
--- a/diff.c
+++ b/diff.c
@@ -55,7 +55,7 @@ static int diff_relative;
static int diff_stat_graph_width;
static int diff_dirstat_permille_default = 30;
static struct diff_options default_diff_options;
-static long diff_algorithm;
+static long diff_algorithm = XDF_HISTOGRAM_DIFF;
static unsigned ws_error_highlight_default = WSEH_NEW;
static char diff_colors[][COLOR_MAXLEN] = {
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index b298f220e0..2f663eab72 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1549,7 +1549,7 @@ test_expect_success 'short lines of opposite
sign do not get marked as moved' '
this line should be marked as oldMoved newMoved
unchanged 4
EOF
- test_expect_code 1 git diff --no-index --color --color-moved=zebra \
+ test_expect_code 1 git diff --diff-algorithm=myers --no-index
--color --color-moved=zebra \
old.txt new.txt >output && cat output &&
grep -v index output | test_decode_color >actual &&
cat >expect <<-\EOF &&
I used histogram above rather than patience, since (a) it's what git's
merge backend uses, (b) it produces roughly similar results to
patience from a user perspective, (c) past testing has shown it to be
somewhat faster than patience, and (d) we've potentially got some
leads in how to speed up our histogram implementation from the README
over at https://github.com/pascalkuthe/imara-diff. But, if you really
wanted to use patience as the default, it'd also be an easy tweak.
Anyway, just some food for thought.