Re: Git is not scalable with too many refs/*

Shawn Pearce <spearce@xxxxxxxxxxx> · Fri, 10 Jun 2011 12:41:39 -0700

On Fri, Jun 10, 2011 at 00:41, Andreas Ericsson <ae@xxxxxx> wrote:
> On 06/09/2011 05:56 PM, Shawn Pearce wrote:
>>
>> A lot of operations toss every commit that a reference points at into
>> the revision walker's LRU queue. If you have a tag pointing to every
>> commit, then the entire project history enters the LRU queue at once,
>> up front. That queue is managed with O(N^2) insertion time. And the
>> entire queue has to be filled before anything can be output.
>
> Hmm. Since we're using pre-hashed data with an obvious lookup method
> we should be able to do much, much better than O(n^2) for insertion
> and better than O(n) for worst-case lookups. I'm thinking a 1-byte
> trie, resulting in a depth, lookup and insertion complexity of 20. It
> would waste some memory but it might be worth it for fixed asymptotic
> complexity for both insertion and lookup.

Not really.

The queue isn't sorting by SHA-1. Its sorting by commit timestamp,
descending. Those aren't pre-hashed. The O(N^2) insertion is because
the code is trying to find where this commit belongs in the list of
commits as sorted by commit timestamp.

There are some priority queue datastructures designed for this sort of
work, e.g. a calendar queue might help. But its not as simple as a 1
byte trie.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html