Re: git diff looping?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 16, 2009 at 09:51:24AM -0700, Junio C Hamano wrote:

> > I can reproduce the problem on Solaris 8 using git v1.6.3. It seems to
> > be caused by a horribly slow system regex implementation; it really
> > chokes on the regex we use to find the "funcname" line for java files.
> 
> Hmm.  Is running under LC_ALL=C LANG=C _with_ the slow system regex help?

No, it remains extremely slow (it is possible that it _is_ faster,
though, but I never managed to run either case to completion; they are
both clearly orders of magnitude off of acceptable).

> In this particular case it is clear that a good way to fix the problem is
> to replace Solaris's dumb regex implemention with what comes in compat/,
> but I at the same time have to wonder if that funcname pattern for java
> can somehow be simplified, so that it does not to require so sophisticated
> implementation of regexp?

That may be a possibility. The default pattern is actually two regexes
(one is a "do not match this" and the other is "match this"). The
problematic one seems to be (and that is a space and a tab between the
brackets):

  ^[      ]*(([   ]*[A-Za-z_][A-Za-z_0-9]*){2,}[  ]*\([^;]*)$

which I determined by setting diff.java.xfuncname just to that (and it
remains slow). Whereas setting it to:

  ^[     ]*(catch|do|for|if|instanceof|new|return|switch|throw|while)

completes in about 5 seconds of CPU time (in the actual pattern it is
negated, but that shouldn't matter as we do the negation ourselves).

Now that being said, 5 seconds is still embarrassingly bad. Watch this
(with the solaris system regex):

  $ git config diff.java.xfuncname '^[ 	]*(catch|do|for|if|instanceof|new|return|switch|throw|while)'
  $ time git diff v0.4.0 >/dev/null
  real    0m5.869s
  user    0m4.720s
  sys     0m0.200s

  $ git config diff.java.xfuncname foo
  $ time git diff v0.4.0 >/dev/null
  real    0m1.895s
  user    0m0.980s
  sys     0m0.210s

So besides learning that this machine is horribly slow, we can see that
running that relatively simple regex takes almost 4 seconds, compared to
a little over 1 second to do the entire rest of the diff. I am inclined
to say that regex performance like that is so bad that we shouldn't care
about optimizing for it, and just use something else.

Bear in mind that the same engine will be used for "grep", too. So you
aren't really doing "git grep" users any favors by linking against such
an awful library.

Really, that performance is so bad that I'm beginning to wonder if I am
somehow measuring something wrong. How could they ship something so
crappy through so many versions?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]