Re: [RFC] Faster git grep.

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 25 Jul 2013 13:41:13 -0700

Ondřej Bílka <neleai@xxxxxxxxx> writes:

> One solution would be to use same trick as was done in google code. 
> Build and keep database of trigraphs and which files contain how many of
> them. When querry is made then check
> only these files that have appropriate combination of trigraphs.

This depends on how you go about trying to reducing the database
overhead, I think.  For example, a very naive approach would be to
create such trigraph hit index for each and every commit for all
paths.  When "git grep $commit $pattern" is run, you would consult
such table with $commit and potential trigraphs derived from the
$pattern to grab the potential paths your hits _might_ be in.

But the contents of a path usually do not change in each and every
commit.  So you may want to instead index with the blob object names
(i.e. which trigraphs appear in what blobs).  But once you go that
route, your "git grep $commit $pattern" needs to read and enumerate
all the blobs that appear in $commit's tree, and see which blobs may
potentially have hits.  Then you would need to build an index every
time you make a new commit for blobs whose trigraphs have not been
counted.

Nice thing is that once a blob (or a commit for that matter) is
created and its object name is known, its contents will not change,
so you can index once and reuse it many times.  But I am not yet
convinced if pre-indexing is an overall win, compared to the cost of
maintaining such a database.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html