Re: git log filtering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 07, 2007 at 01:53:18PM -0800, Linus Torvalds wrote:

> What's PCRE performance like? I'd hate to make "git grep" slower, and it 
> would be stupid and confusing to use two different regex libraries..
>
> Maybe somebody could test - afaik, PCRE has a regex-compatible (from a API 
> standpoint, not from a regex standpoint!) wrapper thing, and it might be 
> interesting to hear if doing "git grep" is slower or faster..

The patch is delightfully simple (though a real patch would probably be
conditional):

diff --git a/Makefile b/Makefile
index aca96c8..cf391dc 100644
--- a/Makefile
+++ b/Makefile
@@ -323,7 +323,7 @@ BUILTIN_OBJS = \
 	builtin-pack-refs.o
 
 GITLIBS = $(LIB_FILE) $(XDIFF_LIB)
-EXTLIBS = -lz
+EXTLIBS = -lz -lpcreposix -lpcre
 
 #
 # Platform specific tweaks
diff --git a/git-compat-util.h b/git-compat-util.h
index c1bcb00..a6c77f9 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -40,7 +40,7 @@
 #include <sys/poll.h>
 #include <sys/socket.h>
 #include <assert.h>
-#include <regex.h>
+#include <pcreposix.h>
 #include <netinet/in.h>
 #include <netinet/tcp.h>
 #include <arpa/inet.h>


A few numbers, all from a fully packed kernel repository:

# glibc, trivial regex
$ /usr/bin/time git grep --cached foo >/dev/null
10.07user 0.15system 0:10.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36617minor)pagefaults 0swaps

# glibc, complex regex
$ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]'  >/dev/null
24.42user 0.15system 0:24.60elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36210minor)pagefaults 0swaps

# pcre, trivial regex
$ /usr/bin/time git grep --cached foo >/dev/null
7.82user 0.12system 0:08.00elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36571minor)pagefaults 0swaps

# pcre, complex regex
$ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]'  >/dev/null
36.51user 0.13system 0:36.65elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36583minor)pagefaults 0swaps


So the winner seems to vary based on the complexity of the pattern.
There are some less rudimentary but non-git performance tests here:

  http://www.boost.org/libs/regex/doc/gcc-performance.html

In every case there, pcre has either comparable performance, or simply
blows away glibc.

One final note that caused some confusion during my testing: git-grep
still uses external grep for working tree greps (i.e., 'git grep foo').
This meant that 'git grep' and 'git grep --cached' produced wildly
different results once I was using pcre internally. Something to look
out for if we switch to pcre (or any other library which doesn't exactly
match external grep behavior!).

-Peff
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]