Thanks Jeff for the answer! You are right, I should have compared with the same regex, and indeed, --word-diff-regex=[^[:space:]] is also much slower than just --word-diff, although they do the same job. Maybe this is a hint that the --word-diff-regex code could be made faster? I have a small understanding of git, but is git diff computing the diff value for the whole file, and then showing in the terminal the 10 first values? In some cases, it seems to be a lot of unnecessary computation! Is there any possibility to ask git-diff to only compare say the first 100 lines? Or compute only when necessary, i.e. when"enter" is prompted in the console? Thanks! Matthieu 2016-11-20 12:17 GMT-08:00 Jeff King <peff@xxxxxxxx>: > On Fri, Nov 18, 2016 at 03:40:22PM -0800, Matthieu S wrote: > >> Why is the speed so different if one uses --word-diff instead of >> --word-diff-regex= ? Is it just because my expression is (slightly) >> more complex than the default one (split on period instead of only >> whitespace) ? Or is it that the default word-diff is implemented >> differently/more efficiently? How can I overcome this speed slowdown? > > I think it's probably both. > > See diff.c:find_word_boundaries(). If there's no regex, we use a simple > loop over isspace() to find the boundaries. I don't recall anybody > measuring the performance before, but I'm not surprised to hear that > matching a regex is slower. > > If I look at the output of "perf", though, it looks like we also spend a > lot more time in xdl_clean_mmatch(). Which isn't surprising. Your regex > treats commas as boundaries, which is going to generate a lot more > matches for this particular data set (though the output is the same, I > think, because of the nature of the change). > > I would have expected "--word-diff-regex=[^[:space:]]" to be faster than > your regex, though, and it does not seem to be. > > -Peff