Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/22/2012 01:52 AM, Jeff King wrote:
On Mon, May 21, 2012 at 06:14:17PM -0400, Jeff King wrote:

The rails and git cases run in ~28s and ~37s, respectively, again mostly
going to the actual object transfer. So I think this series removes all
of the asymptotically bad behavior from this code path.

One thing to note about all of these repos is that they tend to have
several refs pointing to a single commit. None of the speedups in this
series depends on that fact, but it may be that on a repo with more
independent refs, we may uncover other code paths (e.g., I know that my
fix for mark_complete in ea5f220 improves the case with duplicate refs,
but would not help if you really have 400K refs pointing to unique
commits[1]).

Hmm. So I started to do some experimentation with this and noticed
something odd.

Try doing "git fetch . refs/*:refs/*" in a repository with a large
number of refs (e.g., 400K). With git v1.7.10, this takes about 9.5s on
my machine. But using the version of git in "next" takes about 16.5s.
Bisection points to your 432ad41 (refs: store references hierarchically,
2012-04-10). Perf shows sort_ref_dir and msort_with_tmp as hot spots.

I'm looking into this.

For your test, were the 400k references all in one "directory", or were they sharded somehow?

A large number of packed references in a single namespace is the worst-case scenario for the hierarchical refs change. Since the refs in a namespace are all stored in a single array, there is no benefit from the hierarchical storage whereas there is some extra cost from traversing the namespace tree for each lookup. But the performance hit shouldn't be this large.

Michael

--
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]