Re: Fwd: git diff with “--word-diff-regex” extremely slow compared to “--word-diff”?

Jeff King <peff@xxxxxxxx> · Sun, 20 Nov 2016 15:17:44 -0500

On Fri, Nov 18, 2016 at 03:40:22PM -0800, Matthieu S wrote:

> Why is the speed so different if one uses --word-diff instead of
> --word-diff-regex= ? Is it just because my expression is (slightly)
> more complex than the default one (split on period instead of only
> whitespace) ? Or is it that the default word-diff is implemented
> differently/more efficiently? How can I overcome this speed slowdown?

I think it's probably both.

See diff.c:find_word_boundaries(). If there's no regex, we use a simple
loop over isspace() to find the boundaries. I don't recall anybody
measuring the performance before, but I'm not surprised to hear that
matching a regex is slower.

If I look at the output of "perf", though, it looks like we also spend a
lot more time in xdl_clean_mmatch(). Which isn't surprising. Your regex
treats commas as boundaries, which is going to generate a lot more
matches for this particular data set (though the output is the same, I
think, because of the nature of the change).

I would have expected "--word-diff-regex=[^[:space:]]" to be faster than
your regex, though, and it does not seem to be.

-Peff