Skip searching for better indentation heuristics if we'd slide a hunk more than its size. This is the easiest fix proposed in the analysis[1] in response to a patch that mercurial took for xdiff to limit searching by a constant. Using a performance test as: #!python open('a', 'w').write(" \n" * 1000000) open('b', 'w').write(" \n" * 1000001) This patch reduces the execution of "git diff --no-index a b" from 0.70s to 0.31s. However limiting the sliding to the size of the diff hunk, which was proposed as a solution (that I found easiest to implement for now) is not optimal for cases like open('a', 'w').write(" \n" * 1000000) open('b', 'w').write(" \n" * 2000000) as then we'd still slide 1000000 times. In addition to limiting the sliding to size of the hunk, also limit by a constant. Choose 100 lines as the constant as that fits more than a screen, which really means that the diff sliding is probably not providing a lot of benefit anyway. [1] https://public-inbox.org/git/72ac1ac2-f567-f241-41d6-d0f83072e0b3@xxxxxxxxxxxx/ Reported-by: Jun Wu <quark@xxxxxx> Analysis-by: Michael Haggerty <mhagger@xxxxxxxxxxxx> Signed-off-by: Stefan Beller <sbeller@xxxxxxxxxx> --- > Plus, it should always give the right answer. I was tempted to do just that, but I caved. The diff is correct, and the hunk sliding is purely to appease the visual aspect of humans looking at diffs. If your diff can slide more than a monitor height, you're not interested in the best slided diff, but something else is going on. > There are cases where it will be > more expensive than a fixed `MAX_BORING`, but I bet on average it will > be faster. So I did both, settling for performance as the utmost desire. ;-) xdiff/xdiffi.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index 0de1ef463bf..91e98ee9869 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -591,6 +591,11 @@ static void measure_split(const xdfile_t *xdf, long split, */ #define INDENT_WEIGHT 60 +/* + * How far do we slide a hunk at most? + */ +#define INDENT_HEURISTIC_MAX_SLIDING 100 + /* * Compute a badness score for the hypothetical split whose measurements are * stored in m. The weight factors were determined empirically using the tools and @@ -903,7 +908,12 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { long shift, best_shift = -1; struct split_score best_score; - for (shift = earliest_end; shift <= g.end; shift++) { + shift = earliest_end; + if (g.end - groupsize - 1 > shift) + shift = g.end - groupsize - 1; + if (g.end - INDENT_HEURISTIC_MAX_SLIDING > shift) + shift = g.end - INDENT_HEURISTIC_MAX_SLIDING; + for (; shift <= g.end; shift++) { struct split_measurement m; struct split_score score = {0, 0}; -- 2.18.0.345.g5c9ce644c3-goog