Re: How to reduce pickaxe times for a particular repo?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 28 2022, Pavel Rappo wrote:

> On Tue, Jun 28, 2022 at 12:58 PM Ævar Arnfjörð Bjarmason
> <avarab@xxxxxxxxx> wrote:
>
> <snip>
>
>> But eventually you'll simply run into the regex engine being slow
>
> Since I know very little about git internals, I was under a naive
> impression that a significant, if not comparable to that of regex,
> portion of pickaxe's time is spent on computing diffs between
> revisions. So I assumed that there was a way to pre-compute those
> diffs.

Yes and no, maybe sort of :)

Firstly, -S doesn't involve a diff, it's comparing the raw pre-post
image, and seeing how many times we match.

-G does involve computing the diff.

One the one hand we're fast at making diffs, but that really shouldn't
be significant compared to the speed of a regex engine.

The other side of this is that we're really stupid about how we invoke
the regex engine, historical reasons, backwards compatibility & all
that, but we:

 * Aren't compiling the regex once, and using it N times in some cases
   (I have some local patches to fix this)
 * Are computing matches one line at a time, when we could e.g. point
   PCRE to an entire diff with the right line-split options.
 * Are often doing needless work, e.g. in v2.33 I solved an issue with
   us continuing to create diffs when we could abort early (see
   f97fe358576 (pickaxe -G: don't special-case create/delete,
   2021-04-12)), which resulted in some speed-up.q

Some of these are tricky to fix.
> <snip>
>
>>  2. Stick that into Lucene with trigram indexing, e.g. ElasticSearch
>>     might make this easy.
>
> <snip>
>
>> For someone familiar with the tools involved that should be about a day
>> to get to a rough hacky solution, it's mostly gluing existing OTS
>> software together.
>
> <snip>
>
> I'll see what I can do with external systems. You see, I initially
> came from a similar repository exposed through OpenGrok. But I think
> that something was wrong with the index or query syntax because I
> couldn't find the things that I knew were there. I was able to secure
> a git repo that was close to that of OpenGrok as I found pickaxe to be
> robust albeit slow alternative for my searches.

This is the first time I hear about OpenGrok, so no idea, sorry.

One common pitfall with search indexes is that they tend to have a
blacklist of words, e.g. Lucene will have "for", "or" and other common
English words as part of its defaults, so if you're trying to e.g. find
when you altered a for-loop you might silently be getting no results.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux