Re: Fwd: git diff with “--word-diff-regex” extremely slow compared to “--word-diff”?

Jeff King <peff@xxxxxxxx> · Tue, 22 Nov 2016 14:26:58 -0500

On Tue, Nov 22, 2016 at 10:08:33AM -0800, Matthieu S wrote:

> You are right, I should have compared with the same regex, and indeed,
> --word-diff-regex=[^[:space:]] is also much slower than just
> --word-diff, although they do the same job. Maybe this is a hint that
> the --word-diff-regex code could be made faster?

Maybe. If most of the time is spent in the regex engine, there may not
be much we can do. But perhaps there is something in the surrounding
code that can be improved. Looking at find_word_boundaries() (and this
is the first time I've done so), it does look like we regex-match the
whole buffer, and only then find the end-of-line. Now that we have
regexec_buf(), it might be possible to constrain the regex buffer more.

> I have a small understanding of git, but is git diff computing the
> diff value for the whole file, and then showing in the terminal the 10
> first values? In some cases, it seems to be a lot of unnecessary
> computation! Is there any possibility to ask git-diff to only compare
> say the first 100 lines? Or compute only when necessary, i.e.
> when"enter" is prompted in the console?

Git always computes the diff for the whole file. The paging is done by
an external program. So no, there's no easy way to do it incrementally
as the user interacts with the pager, as the pager does not communicate
back to git in any way. However, git should generally be streaming out
results (and the pager showing them) as they're computed, so in an ideal
world you get output immediately, and then the pager buffers the rest of
it while you're reading the first page.

Git does have to look at the whole file in order to do the initial
line-by-line diff, so it would be hard to make that incremental. It
could do the word-coloring for each hunk incrementally, though. I would
have assumed that is already how it is done, though I didn't dig into
it.

-Peff