Re: [PATCH 8/8] diff: improve positioning of add/delete blocks in diffs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 04, 2016 at 12:00:36AM +0200, Michael Haggerty wrote:

> This table shows the number of diff slider groups that were positioned
> differently than the human-generated values, for various repositories.
> "default" is the default "git diff" algorithm. "compaction" is Git 2.9.0
> with the `--compaction-heuristic` option "indent" is an earlier,

s/option/&./

>  static int diff_detect_rename_default;
> +static int diff_indent_heuristic; /* experimental */
>  static int diff_compaction_heuristic; /* experimental */

These two flags are mutually exclusive in the xdiff code, so we should
probably handle that here.

TBH, I do not care that much what:

  [diff]
  compactionHeuristic = true
  indentHeuristic = true

does. But right now:

  git config diff.compactionHeuristic true
  git show --indent-heuristic

still prefers the compaction heuristic, which I think is objectively
wrong.

So perhaps we need a single variable:

  enum {
    DIFF_HEURISTIC_COMPACTION,
    DIFF_HEURISTIC_INDENT
  } diff_heuristic;

and set it in last-one-wins fashion (it would be nice if the config and
command line options were shaped the same way so it's clear to the user
that they are exclusive, but we may have to keep --compaction-heuristic
around for compatibility, as an alias for --diff-heuristic=compaction).

> diff --git a/git-add--interactive.perl b/git-add--interactive.perl
> index 642cce1..ee3d812 100755
> --- a/git-add--interactive.perl
> +++ b/git-add--interactive.perl
> @@ -45,6 +45,7 @@ my ($diff_new_color) =
>  my $normal_color = $repo->get_color("", "reset");
>  
>  my $diff_algorithm = $repo->config('diff.algorithm');
> +my $diff_indent_heuristic = $repo->config_bool('diff.indentheuristic');
>  my $diff_compaction_heuristic = $repo->config_bool('diff.compactionheuristic');

Nice touch.

Unfortunately the mutual-exclusivity handling will probably bleed over
to here, too.

> +/*
> + * If a line is indented more than this, get_indent() just returns this value.
> + * This avoids having to do absurd amounts of work for data that are not
> + * human-readable text, and also ensures that the output of get_indent fits within
> + * an int.
> + */
> +#define MAX_INDENT 200

Speaking of absurd amounts of work, I was curious if there was a
noticeable performance penalty for using this heuristic (just because
it's a lot more complicated than the others). I couldn't detect any
differences running "git log -p --no-merges -3000" on git.git with no
heuristic, compaction, and indent. There may be other repositories that
behave more pathologically (it looks like having 20 blank lines at the
end of each hunk?), but I'd guess in most cases this will always be
drowned out in the noise of doing the actual diff.

> +#define START_OF_FILE_BONUS 9
> +#define END_OF_FILE_BONUS 46
> +#define TOTAL_BLANK_WEIGHT 4
> +#define PRE_BLANK_WEIGHT 16
> +#define RELATIVE_INDENT_BONUS -1
> +#define RELATIVE_INDENT_HAS_BLANK_BONUS 15
> +#define RELATIVE_OUTDENT_BONUS -19
> +#define RELATIVE_OUTDENT_HAS_BLANK_BONUS 2
> +#define RELATIVE_DEDENT_BONUS -63
> +#define RELATIVE_DEDENT_HAS_BLANK_BONUS 50

I see there is a comment below here mentioning that these are empirical
voodoo, but it might be worth one at the top (or just moving these below
the comment) because the comment looks like it's just associated with
the function (and these are sufficiently bizarre that anybody reading is
going to double-take on them).

> +        return 10 * score - bonus;

I don't mind this not "10" not being a #define constant, but after
reading the exchange between you and Stefan, I think it would be nice to
describe what it is in a comment. The rest of the function is commented
so nicely that this one left me thinking "huh?" upon seeing the "10".

The rest looks sane to me, though I am not sure I have absorbed all the
implications. IMHO the most interesting thing is the actual results,
though.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]