On Thu, Jun 07 2018, Matthew Wilcox wrote: > On Thu, Jun 07, 2018 at 09:09:25PM +0200, Ævar Arnfjörð Bjarmason wrote: >> On Thu, Jun 07 2018, Matthew Wilcox wrote: >> > If the first atom of a regex is a bracket expression with an inverted range, >> > git grep is very slow. >> >> I have some WIP patches to fix all of this, which I'll hopefully submit >> before 2.19 is out the door. >> >> What you've discovered here is how shitty your libc regex engine is, >> because unless you provide -P and compile with a reasonably up-to-date >> libpcre (preferably v2) with JIT that's what you'll get. > > I'm using Debian's build, and it is linked against a recent libpcre2: > $ ldd /usr/lib/git-core/git > libpcre2-8.so.0 => /usr/lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f59ad5f2000) > $ dpkg --status libpcre2-8-0 > Version: 10.31-3 > > But I wasn't using -P. If I do, then I see the performance numbers you do: > > $ time git grep -P '[^t]truct_size' >/dev/null > real 0m0.354s > user 0m0.340s > sys 0m0.639s > $ time git grep -P 'struct_size' >/dev/null > real 0m0.336s > user 0m0.552s > sys 0m0.457s > $ time git grep 'struct_size' >/dev/null > real 0m0.335s > user 0m0.535s > sys 0m0.474s > >> So you need to just use an up-to-date libpcre2 & -P and performance >> won't suck. Yeah that's recent enough & will get you all the benefits. > I don't tend to use terribly advanced regexps, so I'll just set > grep.patternType to 'perl' and then it'll automatically be fast for me > without your patches ;-) Indeed, if you're happy with that that'll do it. >> My WIP patches will make us use PCRE for all grep modes, using an API it >> has to convert basic & extended regexp syntax to its own syntax, so >> we'll be able to do that transparently. > > That's clearly the right answer. Thanks! Yeah, unfortunately git-grep's default is "basic" regexp which has a really atrocious syntax that's different enough from extended & Perl's that we probably couldn't just switch it over. That won't be needed with my patches, but maybe I'll follow-up with something to s/basic/extended/g by default, because on side effect of having the pattern converter is that we could have a warning whenever the user has a pattern that would be different under extended/perl, so we can see how common that is.