Re: Fwd: git diff with “--word-diff-regex” extremely slow compared to “--word-diff”?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Jeff for the answer!

You are right, I should have compared with the same regex, and indeed,
--word-diff-regex=[^[:space:]] is also much slower than just
--word-diff, although they do the same job. Maybe this is a hint that
the --word-diff-regex code could be made faster?

I have a small understanding of git, but is git diff computing the
diff value for the whole file, and then showing in the terminal the 10
first values? In some cases, it seems to be a lot of unnecessary
computation! Is there any possibility to ask git-diff to only compare
say the first 100 lines? Or compute only when necessary, i.e.
when"enter" is prompted in the console?

Thanks!

Matthieu

2016-11-20 12:17 GMT-08:00 Jeff King <peff@xxxxxxxx>:
> On Fri, Nov 18, 2016 at 03:40:22PM -0800, Matthieu S wrote:
>
>> Why is the speed so different if one uses --word-diff instead of
>> --word-diff-regex= ? Is it just because my expression is (slightly)
>> more complex than the default one (split on period instead of only
>> whitespace) ? Or is it that the default word-diff is implemented
>> differently/more efficiently? How can I overcome this speed slowdown?
>
> I think it's probably both.
>
> See diff.c:find_word_boundaries(). If there's no regex, we use a simple
> loop over isspace() to find the boundaries. I don't recall anybody
> measuring the performance before, but I'm not surprised to hear that
> matching a regex is slower.
>
> If I look at the output of "perf", though, it looks like we also spend a
> lot more time in xdl_clean_mmatch(). Which isn't surprising. Your regex
> treats commas as boundaries, which is going to generate a lot more
> matches for this particular data set (though the output is the same, I
> think, because of the nature of the change).
>
> I would have expected "--word-diff-regex=[^[:space:]]" to be faster than
> your regex, though, and it does not seem to be.
>
> -Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]