On Mon, May 21, 2012 at 01:45:25PM -0400, Jeff King wrote: > > I don't plan to work on this, but I thought I would point it out in > > case it is causing somebody pain. > > I'll clean up the patch and make one for the filter_refs case, too. Here are the patches. I tested these with three repositories: - the rails/rails network repo, with ~400K refs - the git/git network repo, with ~220 refs - a fake repo made from git.git, with 3 numbered refs pointing at each commit for a total of ~100K refs (I didn't have a real dataset that was ~100K). Before these patches, I never managed to complete a "git clone --mirror file://$repo" on the first two, as I got impatient after 5-10 minutes and killed the process. The third one completed in ~110s. [1/5]: fetch-pack: sort incoming heads [2/5]: fetch-pack: avoid quadratic behavior in remove_duplicates After this one, the "fake" case was down to ~63s. [3/5]: add sorting infrastructure for list refs [4/5]: fetch-pack: sort the list of incoming refs [5/5]: fetch-pack: avoid quadratic loop in filter_refs And after this one, the "fake" case was down to ~32s. Notably, most of the time was spent on the actual object transfer (i.e., before it would hang before even getting to "Counting objects...", and now it starts that almost immediately). Perf corroborates this by showing most of the time in zlib inflate and delta resolution. The rails and git cases run in ~28s and ~37s, respectively, again mostly going to the actual object transfer. So I think this series removes all of the asymptotically bad behavior from this code path. One thing to note about all of these repos is that they tend to have several refs pointing to a single commit. None of the speedups in this series depends on that fact, but it may be that on a repo with more independent refs, we may uncover other code paths (e.g., I know that my fix for mark_complete in ea5f220 improves the case with duplicate refs, but would not help if you really have 400K refs pointing to unique commits[1]). Martin, let me know if this improves things for your many-ref cases (and if you are still seeing slowness in your repos with many refs, let me know which operations cause it). -Peff [1] At the time of that commit, I proposed a fix with a priority queue replacing the commit_list, but we deemed it too much code for a case that was unlikely to happen. But now it sounds like that case is less unlikely than we thought, and now that we have René's linked-list merge-sort code, I think we could fix it with a 2-line change. I'll look into it. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html