On Fri, Nov 18, 2016 at 03:40:22PM -0800, Matthieu S wrote: > Why is the speed so different if one uses --word-diff instead of > --word-diff-regex= ? Is it just because my expression is (slightly) > more complex than the default one (split on period instead of only > whitespace) ? Or is it that the default word-diff is implemented > differently/more efficiently? How can I overcome this speed slowdown? I think it's probably both. See diff.c:find_word_boundaries(). If there's no regex, we use a simple loop over isspace() to find the boundaries. I don't recall anybody measuring the performance before, but I'm not surprised to hear that matching a regex is slower. If I look at the output of "perf", though, it looks like we also spend a lot more time in xdl_clean_mmatch(). Which isn't surprising. Your regex treats commas as boundaries, which is going to generate a lot more matches for this particular data set (though the output is the same, I think, because of the nature of the change). I would have expected "--word-diff-regex=[^[:space:]]" to be faster than your regex, though, and it does not seem to be. -Peff