On 06/09/2011 05:56 PM, Shawn Pearce wrote: > On Thu, Jun 9, 2011 at 08:52, A Large Angry SCM<gitzilla@xxxxxxxxx> wrote: >> On 06/09/2011 11:23 AM, Shawn Pearce wrote: >>> Having a reference to every commit in the repository is horrifically >>> slow. We run into this with Gerrit Code Review and I need to find >>> another solution. Git just wasn't meant to process repositories like >>> this. >> >> Assuming a very large number of refs, what is it that makes git so >> horrifically slow? Is there a design or implementation lesson here? > > A few things. > > Git does a sequential scan of all references when it first needs to > access references for an operation. This requires reading the entire > packed-refs file, and the recursive scan of the "refs/" subdirectory > for any loose refs that might override the packed-refs file. > > A lot of operations toss every commit that a reference points at into > the revision walker's LRU queue. If you have a tag pointing to every > commit, then the entire project history enters the LRU queue at once, > up front. That queue is managed with O(N^2) insertion time. And the > entire queue has to be filled before anything can be output. > Hmm. Since we're using pre-hashed data with an obvious lookup method we should be able to do much, much better than O(n^2) for insertion and better than O(n) for worst-case lookups. I'm thinking a 1-byte trie, resulting in a depth, lookup and insertion complexity of 20. It would waste some memory but it might be worth it for fixed asymptotic complexity for both insertion and lookup. -- Andreas Ericsson andreas.ericsson@xxxxxx OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html