Re: Some git performance measurements..

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 28 Nov 2007 19:14:50 -0800 (PST)

On Wed, 28 Nov 2007, Linus Torvalds wrote:
> 
> Sadly, one big reason for performance downsides is actually hard to 
> measure: since we mmap() all the really critical data structures (the 
> pack-file index and data), it doesn't show up really well in any of the 
> otherwise really useful performance tools (eg strace and ltrace), because 
> the bulk of the time is actually spent not in libraries or in system 
> calls, but simply on regular instructions that take a page fault.

Side note: while the ".gitignore" issue actually overshadowed everything 
else for things like "git checkout" and "git merge", I do have some traces 
for page faults too. They're kind of interesting, but actually show that 
we do reasonably well in this area - certainly much better than if we 
tried to keep things in multiple independent files.

The pack-files (both index and data) are accessed somewhat randomly, but 
there is still enough locality that doing read-ahead and clustering really 
does help.

So while "git checkout" actually end up reading a lot of git "tree" 
objects (as many tree objects as there are .gitignore files that we try to 
open unnecessarily!), we actually spend a *lot* less time on the pack-file 
operations than we do on trying to do the futile ".gitignore" opens in 
every directory. 

If somebody cares, I do have the page fault traces on those things for the 
same "git checkout" that had the horrid .gitignore behaviour. What's 
interesting is:

 - the pack-file really does get accessed in nice increasing patterns for 
   the top-of-tree. So the fact that we sort objects "topologically" does 
   actually seem to work pretty well (yeah, it was obvious, but it's nice 
   to have a real page fault trace that shows the effect!)

   The accesses seemed to be *totally* ordered. There were obvously gaps 
   (ie they weren't *dense*, since we only looked at one version of the 
   trees in "git checkout"), but at the same time, it certainly wasn't 
   random or causing any horrible seeking back-and-forth.

 - we're *reasonably* dense in pack-file accesses. Not wonderful, but not 
   horrible. That in turn means that read-ahead kicks in and works ok. The 
   pattern for the pack-file hat I caught looks like this (the format is 
   "pageindex: cycles-to-fill-in-kernel"):

	6810: 15202545
	6811: 353
	6812: 267
	6813: 476
	6814: 559
	6826: 1086510
	6878: 13948057
	6894: 9756446
	6896: 300
	6899: 307
	6903: 319
	6907: 293
	6910: 666120
	6912: 401
	6913: 330
	6916: 401
	6918: 303
	6919: 330
	6931: 784373
	6943: 405
	6944: 187
	6945: 315
	...

   and here you can see that read-ahead works fine about 60% of the time,
   with most pages taking just 300-500 CPU cycles (no actual IO, of 
   course!) to find/lookup, but then when there are discontinuities we 
   have the cost of the real IO, and the cost for those obviously then 
   jumps into the tens of millions of cycles (ie we're talking 
   several milliseconds).

 - the index accesses are much more "random": the initial 256-way fan-out 
   followed by the binary search causes the access patterns to look very 
   different:

	0: 28367707
	136: 18867574
	140: 221280
	141: 745890
	142: 284427
	143: 338
	381: 9787459
	377: 394
	375: 255
	376: 248
	3344: 29885989
	3347: 334
	3346: 255
	3684: 7251911
	1055: 12954064
	1052: 386
	1050: 251
	1049: 240
	1947: 10501455
	1944: 382
	1946: 262

   where it doesn't even read-ahead at all in the beginning (because it 
   looks entirely random), but the kernel eventually *does* actually go 
   into read-ahead mode pretty soon simply because once it gets into the 
   binary search thing, the data entries are close enough to be in 
   adjacent pages, and it all looks ok.

   As a result, by the end of the run, all the index file pages have been 
   cached, and we're getting uniformly low numbers (ie in the hundreds of 
   cycles, not millions):

	...
	490: 810
	489: 352
	488: 334
	2254: 907
	91: 484
	3580: 776
	962: 806
	522: 761
	2494: 514
	805: 653
	177: 495
	176: 375
	2861: 439
	649: 660
	648: 518
	...

   so the pack index file access patterns are much less predictable, but 
   since the pack index is so dense and relatively small, that doesn't end 
   up being a problem in the long run, and only in the beginning do we pay 
   a highish cost for having to read it into memory.

In other words: we do ok. I think we could do better (and an SSD would 
certainly help, since we're never going to do long and 100% contiguous 
IO), but I was expecting to see *big* trouble, and it really wasn't that 
horrid.

That said, I think there's something subtly wrong in our pack-file 
sorting, and it should be more contiguous when we just do tree object 
accesses on the top commit. I was really hoping that all the top-level 
trees should be written entirely together, but I wonder if the "write out 
deltas first" thing causes us to have those big gaps in between.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html