On Thu, Aug 04, 2016 at 12:00:36AM +0200, Michael Haggerty wrote: > This table shows the number of diff slider groups that were positioned > differently than the human-generated values, for various repositories. > "default" is the default "git diff" algorithm. "compaction" is Git 2.9.0 > with the `--compaction-heuristic` option "indent" is an earlier, s/option/&./ > static int diff_detect_rename_default; > +static int diff_indent_heuristic; /* experimental */ > static int diff_compaction_heuristic; /* experimental */ These two flags are mutually exclusive in the xdiff code, so we should probably handle that here. TBH, I do not care that much what: [diff] compactionHeuristic = true indentHeuristic = true does. But right now: git config diff.compactionHeuristic true git show --indent-heuristic still prefers the compaction heuristic, which I think is objectively wrong. So perhaps we need a single variable: enum { DIFF_HEURISTIC_COMPACTION, DIFF_HEURISTIC_INDENT } diff_heuristic; and set it in last-one-wins fashion (it would be nice if the config and command line options were shaped the same way so it's clear to the user that they are exclusive, but we may have to keep --compaction-heuristic around for compatibility, as an alias for --diff-heuristic=compaction). > diff --git a/git-add--interactive.perl b/git-add--interactive.perl > index 642cce1..ee3d812 100755 > --- a/git-add--interactive.perl > +++ b/git-add--interactive.perl > @@ -45,6 +45,7 @@ my ($diff_new_color) = > my $normal_color = $repo->get_color("", "reset"); > > my $diff_algorithm = $repo->config('diff.algorithm'); > +my $diff_indent_heuristic = $repo->config_bool('diff.indentheuristic'); > my $diff_compaction_heuristic = $repo->config_bool('diff.compactionheuristic'); Nice touch. Unfortunately the mutual-exclusivity handling will probably bleed over to here, too. > +/* > + * If a line is indented more than this, get_indent() just returns this value. > + * This avoids having to do absurd amounts of work for data that are not > + * human-readable text, and also ensures that the output of get_indent fits within > + * an int. > + */ > +#define MAX_INDENT 200 Speaking of absurd amounts of work, I was curious if there was a noticeable performance penalty for using this heuristic (just because it's a lot more complicated than the others). I couldn't detect any differences running "git log -p --no-merges -3000" on git.git with no heuristic, compaction, and indent. There may be other repositories that behave more pathologically (it looks like having 20 blank lines at the end of each hunk?), but I'd guess in most cases this will always be drowned out in the noise of doing the actual diff. > +#define START_OF_FILE_BONUS 9 > +#define END_OF_FILE_BONUS 46 > +#define TOTAL_BLANK_WEIGHT 4 > +#define PRE_BLANK_WEIGHT 16 > +#define RELATIVE_INDENT_BONUS -1 > +#define RELATIVE_INDENT_HAS_BLANK_BONUS 15 > +#define RELATIVE_OUTDENT_BONUS -19 > +#define RELATIVE_OUTDENT_HAS_BLANK_BONUS 2 > +#define RELATIVE_DEDENT_BONUS -63 > +#define RELATIVE_DEDENT_HAS_BLANK_BONUS 50 I see there is a comment below here mentioning that these are empirical voodoo, but it might be worth one at the top (or just moving these below the comment) because the comment looks like it's just associated with the function (and these are sufficiently bizarre that anybody reading is going to double-take on them). > + return 10 * score - bonus; I don't mind this not "10" not being a #define constant, but after reading the exchange between you and Stefan, I think it would be nice to describe what it is in a comment. The rest of the function is commented so nicely that this one left me thinking "huh?" upon seeing the "10". The rest looks sane to me, though I am not sure I have absorbed all the implications. IMHO the most interesting thing is the actual results, though. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html