Re: grep vs git grep performance?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 27 2017, Joe Perches jotted:

> On Thu, 2017-10-26 at 10:45 -0700, Stefan Beller wrote:
>> On Thu, Oct 26, 2017 at 10:41 AM, Joe Perches <joe@xxxxxxxxxxx> wrote:
>> > On Thu, 2017-10-26 at 09:58 -0700, Stefan Beller wrote:
>> > > + Avar who knows a thing about pcre (I assume the regex compilation
>> > > has impact on grep speed)
>> > >
>> > > On Thu, Oct 26, 2017 at 8:02 AM, Joe Perches <joe@xxxxxxxxxxx> wrote:
>> > > > Comparing a cache warm git grep vs command line grep
>> > > > shows significant differences in cpu & wall clock.
>> > > >
>> > > > Any ideas how to improve this?
>> > > >
>> > > > $ time git grep "\bseq_.*%p\W" | wc -l
>> > > > 112
>> > > >
>> > > > real    0m4.271s
>> > > > user    0m15.520s
>> > > > sys     0m0.395s
>> > > >
>> > > > $ time grep -r --include=*.[ch] "\bseq_.*%p\W" * | wc -l
>> > > > 112
>> > > >
>> > > > real    0m1.164s
>> > > > user    0m0.847s
>> > > > sys     0m0.314s
>> > > >
>> > >
>> > > I wonder how much is algorithmic advantage vs coding/micro
>> > > optimization that we can do.
>> >
>> > As do I.  I presume this is libpcre related.
>> >
>> > For instance, git grep performance is better than grep for:
>> >
>> > $ time git grep -w "seq_printf" -- "*.[ch]" | wc -l
>> > 8609
>> >
>> > real    0m0.301s
>> > user    0m0.548s
>> > sys     0m0.372s
>> >
>> > $ time grep -w -r --include=*.[ch] "seq_printf" * | wc -l
>> > 8609
>> >
>> > real    0m0.706s
>> > user    0m0.396s
>> > sys     0m0.309s
>> >
>>
>> One important piece of information is what version of Git you are running,
>>
>>
>> $ git tag --contains origin/ab/pcre-v2
>> v2.14.0
>
> v2.10
>
>> ...
>>
>> (and the version of pcre, see the numbers)
>> https://git.kernel.org/pub/scm/git/git.git/commit/?id=94da9193a6eb8f1085d611c04ff8bbb4f5ae1e0a
>
> I definitely didn't have that one.
>
> I recompiled git latest (with USE_LIBPCRE2) and reran.
>
> Here are the results
>
> $ git --version
> git version 2.15.0.rc2.48.g4e40fb3
>
> $ time git grep -P "\bseq_.*%p\W" -- "*.[ch]" | wc -l
> 112
>
> real	0m0.437s
> user	0m1.008s
> sys	0m0.381s
>
> So, git grep performance has already been
> quite successfully improved.

...and I have WIP patches to use the PCRE engine for patterns without -P
which I intend to start sending soon after the next release.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux