On Wed, Feb 07, 2007 at 01:53:18PM -0800, Linus Torvalds wrote: > What's PCRE performance like? I'd hate to make "git grep" slower, and it > would be stupid and confusing to use two different regex libraries.. > > Maybe somebody could test - afaik, PCRE has a regex-compatible (from a API > standpoint, not from a regex standpoint!) wrapper thing, and it might be > interesting to hear if doing "git grep" is slower or faster.. The patch is delightfully simple (though a real patch would probably be conditional): diff --git a/Makefile b/Makefile index aca96c8..cf391dc 100644 --- a/Makefile +++ b/Makefile @@ -323,7 +323,7 @@ BUILTIN_OBJS = \ builtin-pack-refs.o GITLIBS = $(LIB_FILE) $(XDIFF_LIB) -EXTLIBS = -lz +EXTLIBS = -lz -lpcreposix -lpcre # # Platform specific tweaks diff --git a/git-compat-util.h b/git-compat-util.h index c1bcb00..a6c77f9 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -40,7 +40,7 @@ #include <sys/poll.h> #include <sys/socket.h> #include <assert.h> -#include <regex.h> +#include <pcreposix.h> #include <netinet/in.h> #include <netinet/tcp.h> #include <arpa/inet.h> A few numbers, all from a fully packed kernel repository: # glibc, trivial regex $ /usr/bin/time git grep --cached foo >/dev/null 10.07user 0.15system 0:10.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+36617minor)pagefaults 0swaps # glibc, complex regex $ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]' >/dev/null 24.42user 0.15system 0:24.60elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+36210minor)pagefaults 0swaps # pcre, trivial regex $ /usr/bin/time git grep --cached foo >/dev/null 7.82user 0.12system 0:08.00elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+36571minor)pagefaults 0swaps # pcre, complex regex $ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]' >/dev/null 36.51user 0.13system 0:36.65elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+36583minor)pagefaults 0swaps So the winner seems to vary based on the complexity of the pattern. There are some less rudimentary but non-git performance tests here: http://www.boost.org/libs/regex/doc/gcc-performance.html In every case there, pcre has either comparable performance, or simply blows away glibc. One final note that caused some confusion during my testing: git-grep still uses external grep for working tree greps (i.e., 'git grep foo'). This meant that 'git grep' and 'git grep --cached' produced wildly different results once I was using pcre internally. Something to look out for if we switch to pcre (or any other library which doesn't exactly match external grep behavior!). -Peff - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html