On 05/22/2012 01:52 AM, Jeff King wrote:
On Mon, May 21, 2012 at 06:14:17PM -0400, Jeff King wrote:
The rails and git cases run in ~28s and ~37s, respectively, again mostly
going to the actual object transfer. So I think this series removes all
of the asymptotically bad behavior from this code path.
One thing to note about all of these repos is that they tend to have
several refs pointing to a single commit. None of the speedups in this
series depends on that fact, but it may be that on a repo with more
independent refs, we may uncover other code paths (e.g., I know that my
fix for mark_complete in ea5f220 improves the case with duplicate refs,
but would not help if you really have 400K refs pointing to unique
commits[1]).
Hmm. So I started to do some experimentation with this and noticed
something odd.
Try doing "git fetch . refs/*:refs/*" in a repository with a large
number of refs (e.g., 400K). With git v1.7.10, this takes about 9.5s on
my machine. But using the version of git in "next" takes about 16.5s.
Bisection points to your 432ad41 (refs: store references hierarchically,
2012-04-10). Perf shows sort_ref_dir and msort_with_tmp as hot spots.
I'm looking into this.
For your test, were the 400k references all in one "directory", or were
they sharded somehow?
A large number of packed references in a single namespace is the
worst-case scenario for the hierarchical refs change. Since the refs in
a namespace are all stored in a single array, there is no benefit from
the hierarchical storage whereas there is some extra cost from
traversing the namespace tree for each lookup. But the performance hit
shouldn't be this large.
Michael
--
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html