On Sun, Apr 01, 2012 at 12:11:01AM +0200, René Scharfe wrote: > Speed up prepare_revision_walk() by adding commits without sorting > to the commit_list and at the end sort the list in one go. Thanks > to mergesort() working behind the scenes, this is a lot faster for > large numbers of commits than the current insert sort. I think this is probably a sane thing to do, but I have two slight misgivings: 1. Is it worth the complexity of the linked-list mergesort? I was planning to just build an array, qsort it, and then put the results into a linked list. The patch for that is below for reference. It's a lot less code and complexity for the same performance (actually, I measured it at 1% faster, but that is probably negligible). The downside is that it is not nicely encapsulated in commit_list_sort_by_date(). We call the latter from two other places; I don't know if they can be fed with enough commits to actually benefit from the performance gain or not. 2. I'm not super happy about fixing this one spot. This quadratic behavior comes up in a lot of places, and we're slowly hacking them one by one. E.g., this does nothing to help the same case in fetch-pack.c:mark_complete[1]. Nor does it help the fact that when we follow parents, we will do an O(n) insert_by_date for each commit we insert. The latter is largely saved by the locality of timestamps (i.e., timestamps of the parents of recently popped commits tend to be near the front of the list), as well as the hack in fce87ae (Fix quadratic performance in rewrite_one., 2008-07-12). So I wonder if in the long term we would benefit from a better data structure, which would make these problems just go away. That being said, there is a lot of code to be updated with such a change, so even if we do want to do that eventually, a quick fix like this is probably still a good thing. -Peff [1] I fixed the mark_complete thing in ea5f220 (fetch: avoid repeated commits in mark_complete, 2011-05-19), but only for exact-duplicate commits. The real-world case where it came up was an "alternates" repository that held refs for many clones (so we had hundreds or thousands of copies of each tag). But on a repository like the one we are testing on, I think it would be similarly slow. --- Here's the qsort-in-array patch, for reference. diff --git a/revision.c b/revision.c index b3554ed..22c26d0 100644 --- a/revision.c +++ b/revision.c @@ -2062,10 +2062,24 @@ static void set_children(struct rev_info *revs) } } +static int commit_compare_by_date(const void *va, const void *vb) +{ + const struct commit *a = va; + const struct commit *b = vb; + if (a->date < b->date) + return -1; + if (b->date < a->date) + return 1; + return 0; +} + int prepare_revision_walk(struct rev_info *revs) { int nr = revs->pending.nr; struct object_array_entry *e, *list; + struct commit **commits = NULL; + int commits_nr = 0, commits_alloc = 0; + int i; e = list = revs->pending.objects; revs->pending.nr = 0; @@ -2076,11 +2090,17 @@ int prepare_revision_walk(struct rev_info *revs) if (commit) { if (!(commit->object.flags & SEEN)) { commit->object.flags |= SEEN; - commit_list_insert_by_date(commit, &revs->commits); + ALLOC_GROW(commits, commits_nr + 1, commits_alloc); + commits[commits_nr++] = commit; } } e++; } + qsort(commits, commits_nr, sizeof(*commits), commit_compare_by_date); + for (i = commits_nr - 1; i >= 0; i--) + commit_list_insert(commits[i], &revs->commits); + free(commits); + if (!revs->leak_pending) free(list); -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html