Re: Performance issue of 'git branch'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Wed, 22 Jul 2009, Linus Torvalds wrote:
> 
> The GIT_DEBUG_LOOKUP debug output probably does match the number of 
> cold-cache IO's fairly well for something like this (at least to a first 
> approximation), so I really hope my patch will fix your problem.

Side note: the object lookup binary search we do is simple and reasonably 
efficient, but it is _not_ very cache-friendly (where "cache-friendly" 
also in this case means IO caches).

There are more cache-friendly ways of searching, although the really 
clever ones would require us to switch the format of the pack-file index 
around. Which would be a fairly big pain (in addition to making the lookup 
a lot more complex).

The _simpler_ cache-friendly alternative is likely to try the "guess 
location by assuming the SHA1's are evenly spread out" thing doesn't jump 
back-and-forth like a binary search does.

We tried it a few years ago, but didn't do cold-cache numbers. And 
repositories were smaller too.

With something like the kernel repo, with 1.2+ million objects, a binary 
search needs about 21 comparisons for each object we look up. The index 
has a first-level fan-out of 256, so that takes away 8 of them, but we're 
still talking about 13 comparisons. With bad locality except for the very 
last ones.

Assuming a 4kB page-size, and about 170 index entries per page (~7 binary 
search levels), that's 6 pages we have to page-fault in for each search. 
And we probably won't start seeing lots of cache reuse until we hit 
hundreds or thousands of objects searched for.

With soemthing like "three iterations of newton-raphson + linear search", 
we might end up with more index entries looked at, but we'd quite possibly 
get much better locality.

I suspect the old newton-raphson patches we had (Discussions and patches 
back in April 2007 on this list) could be resurrected pretty easily.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]