On Thu, Mar 29, 2012 at 08:43:06PM -0600, Martin Fick wrote: > >It is trying to minimize the transfer cost. By showing a ref to the > >sending side, you prove you have chains of commits leading to that > >commit > >and the sender knows that it does not have to send objects that are > >reachable from that ref. One thing you could immediately do is de-dup > >the > >100k refs but we may already do that in the current code. > > I am sorry I don't quite understand what you are suggesting is taking > up the CPU time? It doesn't take that much CPU just to gather 100refs > and send them to the other side, that would be i/o bound. Could you > explain what is happening on the receiving side that is so time > consuming? You said earlier that it is "git rev-list --objects --stdin --not --all" taking up all the CPU. That is probably called by check_everything_connected. And that is why it is slow when you push even a small change, but fast when you push only a deletion (in the latter case, we skip the check because there are no new objects). As for why that rev-list is slow, my suspicion is that it may be quadratic behavior in commit_list_insert_by_date as we process the set of negative refs. Basically, we keep a priority queue of commits to be processed in our graph walk, but the queue is stored as a linked list. So insertion is O(n), and building a list of n items (especially if they are not in sorted order) is O(n^2). I've run into this before dealing with repos with many refs (at GitHub, some of our alternates repositories hit 100K refs, although typically we have a lot of duplicated refs, as we are storing identical tags from many repositories). But that's just a suspicion. I don't have time tonight to work out a test case. Is it possible for you to run something like: # make a new commit on top of HEAD, but not yet referenced sha1=`git commit-tree HEAD^{tree} -p HEAD </dev/null` # now do the same "connected" test that receive-pack would do git rev-list --objects $sha1 --not --all That should replicate the slow behavior you are seeing. If that works, try running the latter command under "perf"; my guess is that you will see commit_list_insert_by_date as a hot-spot. Even doing this simple test on a moderate repository (my git.git has ~1100 refs), commit_list_insert_by_date accounts for 10% of the CPU according to perf. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html