Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/22/2012 06:11 AM, Jeff King wrote:
On Tue, May 22, 2012 at 05:59:29AM +0200, Michael Haggerty wrote:

Try doing "git fetch . refs/*:refs/*" in a repository with a large
number of refs (e.g., 400K). With git v1.7.10, this takes about 9.5s on
my machine. But using the version of git in "next" takes about 16.5s.
Bisection points to your 432ad41 (refs: store references hierarchically,
2012-04-10). Perf shows sort_ref_dir and msort_with_tmp as hot spots.

I'm looking into this.

For your test, were the 400k references all in one "directory", or
were they sharded somehow?

Sharded. This was the rails network repo, so it basically looks like
this:

   refs/remotes/XXXXXXX/heads/master
   refs/remotes/XXXXXXX/tags/v1.0
   refs/remotes/XXXXXXX/tags/v1.1
   ...
   refs/remotes/XXXXXXX/tags/v3.5
   refs/remotes/YYYYYYY/heads/master
   refs/remotes/YYYYYYY/tags/v1.0
   refs/remotes/YYYYYYY/tags/v1.1
   ...

Basically the same 90 or so refs you would have in a normal repository
(a branch or two, and a bunch of tags), repeated over and over under
numbered remote hierarchies (i.e., each remote is basically a copy of
the refs from somebody's fork of the repo).

I can provide you with the exact repo if you want, but it is about 100M.

Interestingly, I don't seem to get nearly as much slow-down in my "fake"
many-refs repo, which has all 100K entries directly in refs/heads.

That could explain why I cannot reproduce your result. I have done all of my testing in fake repos (with sharded and non-sharded variants) with very little file content.

If it is not too much trouble, please let me know where I can obtain your test repo and what commands you used to get your result. (Was the local repo already a full clone of the remote, including all 400k references? How was the remote set up--sharing data or not, local or remote? Warm or cold disk cache?)

Thanks,
Michael

--
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]