Re: Git is not scalable with too many refs/*

Andreas Ericsson <ae@xxxxxx> · Fri, 10 Jun 2011 09:41:53 +0200

On 06/09/2011 05:56 PM, Shawn Pearce wrote:
> On Thu, Jun 9, 2011 at 08:52, A Large Angry SCM<gitzilla@xxxxxxxxx>  wrote:
>> On 06/09/2011 11:23 AM, Shawn Pearce wrote:
>>> Having a reference to every commit in the repository is horrifically
>>> slow. We run into this with Gerrit Code Review and I need to find
>>> another solution. Git just wasn't meant to process repositories like
>>> this.
>>
>> Assuming a very large number of refs, what is it that makes git so
>> horrifically slow? Is there a design or implementation lesson here?
> 
> A few things.
> 
> Git does a sequential scan of all references when it first needs to
> access references for an operation. This requires reading the entire
> packed-refs file, and the recursive scan of the "refs/" subdirectory
> for any loose refs that might override the packed-refs file.
> 
> A lot of operations toss every commit that a reference points at into
> the revision walker's LRU queue. If you have a tag pointing to every
> commit, then the entire project history enters the LRU queue at once,
> up front. That queue is managed with O(N^2) insertion time. And the
> entire queue has to be filled before anything can be output.
> 

Hmm. Since we're using pre-hashed data with an obvious lookup method
we should be able to do much, much better than O(n^2) for insertion
and better than O(n) for worst-case lookups. I'm thinking a 1-byte
trie, resulting in a depth, lookup and insertion complexity of 20. It
would waste some memory but it might be worth it for fixed asymptotic
complexity for both insertion and lookup.

-- 
Andreas Ericsson                   andreas.ericsson@xxxxxx
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html