Re: Git is not scalable with too many refs/*

Jakub Narebski <jnareb@xxxxxxxxx> · Fri, 10 Jun 2011 13:12:12 -0700 (PDT)

Shawn Pearce <spearce@xxxxxxxxxxx> writes:
> On Fri, Jun 10, 2011 at 00:41, Andreas Ericsson <ae@xxxxxx> wrote:
>> On 06/09/2011 05:56 PM, Shawn Pearce wrote:
>>>
>>> A lot of operations toss every commit that a reference points at into
>>> the revision walker's LRU queue. If you have a tag pointing to every
>>> commit, then the entire project history enters the LRU queue at once,
>>> up front. That queue is managed with O(N^2) insertion time. And the
>>> entire queue has to be filled before anything can be output.
>>
>> Hmm. Since we're using pre-hashed data with an obvious lookup method
>> we should be able to do much, much better than O(n^2) for insertion
>> and better than O(n) for worst-case lookups. I'm thinking a 1-byte
>> trie, resulting in a depth, lookup and insertion complexity of 20. It
>> would waste some memory but it might be worth it for fixed asymptotic
>> complexity for both insertion and lookup.
> 
> Not really.
> 
> The queue isn't sorting by SHA-1. Its sorting by commit timestamp,
> descending. Those aren't pre-hashed. The O(N^2) insertion is because
> the code is trying to find where this commit belongs in the list of
> commits as sorted by commit timestamp.
> 
> There are some priority queue datastructures designed for this sort of
> work, e.g. a calendar queue might help. But its not as simple as a 1
> byte trie.

In the case of Subversion numbers (revision number to hash mapping)
sorted by name (in version order at least) means sorted by date.  I
wonder if there is data structure for which this is optimum insertion
order (like for insertion sort almost sorted data is best case).

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html