Re: Git is not scalable with too many refs/*

Andreas Ericsson <ae@xxxxxx> · Mon, 13 Jun 2011 09:08:09 +0200

On 06/10/2011 09:41 PM, Shawn Pearce wrote:
> On Fri, Jun 10, 2011 at 00:41, Andreas Ericsson<ae@xxxxxx>  wrote:
>> On 06/09/2011 05:56 PM, Shawn Pearce wrote:
>>>
>>> A lot of operations toss every commit that a reference points at into
>>> the revision walker's LRU queue. If you have a tag pointing to every
>>> commit, then the entire project history enters the LRU queue at once,
>>> up front. That queue is managed with O(N^2) insertion time. And the
>>> entire queue has to be filled before anything can be output.
>>
>> Hmm. Since we're using pre-hashed data with an obvious lookup method
>> we should be able to do much, much better than O(n^2) for insertion
>> and better than O(n) for worst-case lookups. I'm thinking a 1-byte
>> trie, resulting in a depth, lookup and insertion complexity of 20. It
>> would waste some memory but it might be worth it for fixed asymptotic
>> complexity for both insertion and lookup.
> 
> Not really.
> 
> The queue isn't sorting by SHA-1. Its sorting by commit timestamp,
> descending. Those aren't pre-hashed. The O(N^2) insertion is because
> the code is trying to find where this commit belongs in the list of
> commits as sorted by commit timestamp.
> 

Hmm. We should still be able to do better than that, and particularly
for the "tag-each-commit" workflow. Since it's most likely those tags
are generated using incrementing numbers, we could have a cut-off where
we first parse all the refs and make an optimistic assumption that an
alphabetical sort of the refs provides a map of insertion-points for
the commits. Since the best case behaviour is still O(1) for insertion
sort and it's unlikely that thousands of refs are in random order, that
should cause the vast majority of the refs we insert to follow the best
case scenario.

This will fall on its arse when people start doing hg-ref -> git-commit
tags ofcourse, but that doesn't seem to be happening, or at least not to
the same extent as with svn-revisions -> git-gommit mapping.

We're still not improving the asymptotic complexity, but it's a pretty
safe bet that we for a vast majority of cases improve wallclock runtime
by a hefty amount with a relatively minor effort.

-- 
Andreas Ericsson                   andreas.ericsson@xxxxxx
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html