Re: git and larger trees, not so fast?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, 9 Aug 2007, moe wrote:
> 
> i made some tests on latest master branch
> (1.5.3.rc4.29.g74276) and it seems like git
> hits a wall somewhere above ~50k files.

Good catch. Definitely not acceptable performance.

We seem to spend a lot of our time in memcpy:

	samples  %        image name               app name                 symbol name
	200527   25.4551  libc-2.6.so              libc-2.6.so              _wordcopy_bwd_aligned
	104505   13.2660  libc-2.6.so              libc-2.6.so              _wordcopy_fwd_aligned
	99185    12.5907  libz.so.1.2.3            libz.so.1.2.3            (no symbols)
	83452    10.5935  libc-2.5.so              libc-2.5.so              (no symbols)
	54203     6.8806  git                      git                      assign_blame
	46153     5.8587  git                      git                      read_directory_recursive
	27665     3.5118  git                      git                      handle_split
	21385     2.7146  vmlinux                  vmlinux                  blk_complete_sgv4_hdr_rq
	20745     2.6334  git                      git                      read_packed_refs
	12709     1.6133  git                      git                      builtin_diffstat
	7829      0.9938  git                      git                      show_patch_diff
	...

but the silly thing is, this is only true if you give the filenames 
explicitly!

Lookie here:

	[torvalds@woody bummer]$ date >50/500
	[torvalds@woody bummer]$ time git commit -a -m 'expose the turtle'
	Created commit 25ca22d: expose the turtle
	 1 files changed, 1 insertions(+), 1 deletions(-)
	
	real    0m4.612s
	user    0m4.224s
	sys     0m0.412s

	[torvalds@woody bummer]$ date >50/500
	[torvalds@woody bummer]$ time git commit -m 'expose the turtle' 50/500
	Created commit 009f6b5: expose the turtle
	 1 files changed, 1 insertions(+), 1 deletions(-)
	
	real    0m12.464s
	user    0m12.129s
	sys     0m0.336s

ie we take almost three times longer with explicitly naming the file, than 
when just using "git commit -a". Oops.

That said, even the 4.6 seconds is really not acceptable: this is on a 
good 2.6GHz Core 2 Duo too, so on weaker hardware it would be quite 
painful.

I haven't looked at *why* it's that slow, but it's not anything really 
fundamental, the basic operations are fast:

	[torvalds@woody bummer]$ time git add 50/500

	real    0m0.064s
	user    0m0.048s
	sys     0m0.016s

	[torvalds@woody bummer]$ time git write-tree
	7480230419e510c93082a4a19e23d928a426973a
	
	real    0m0.069s
	user    0m0.048s
	sys     0m0.024s

	[torvalds@woody bummer]$ time git diff
	
	real    0m0.127s
	user    0m0.000s
	sys     0m0.000s

so it's not the "lstat()" that we do on all files, or the write-tree 
(which are all O(n) in files, with a rather small constant), but some 
O(n**2) behaviour elsewhere.

And all the expense seems to be in not the commit itself, but in

	[torvalds@woody bummer]$ time git 'runstatus' '--nocolor'

	real    0m4.208s
	user    0m4.068s
	sys     0m0.140s

and that thing seems to suck really really hard.

Doing an ltrace on it shows tons and tons of:

	...
	strlen("35")
	strlen("349")
	calloc(1, 72)
	memcpy(0x73034e, "10/", 3)
	memcpy(0x730351, "349", 4)
	memmove(0x2ab637f41e80, 0x2ab637f41e78, 781768)
	...

but I haven't looked at where they come from yet.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux