Re: Some git performance measurements..

Nicolas Pitre <nico@xxxxxxx> · Wed, 28 Nov 2007 22:59:37 -0500 (EST)

On Wed, 28 Nov 2007, Linus Torvalds wrote:

>  - the index accesses are much more "random": the initial 256-way fan-out 
>    followed by the binary search causes the access patterns to look very 
>    different:
> 
> 	0: 28367707
> 	136: 18867574
> 	140: 221280
> 	141: 745890
> 	142: 284427
> 	143: 338
> 	381: 9787459
> 	377: 394
> 	375: 255
> 	376: 248
> 	3344: 29885989
> 	3347: 334
> 	3346: 255
> 	3684: 7251911
> 	1055: 12954064
> 	1052: 386
> 	1050: 251
> 	1049: 240
> 	1947: 10501455
> 	1944: 382
> 	1946: 262
> 
>    where it doesn't even read-ahead at all in the beginning (because it 
>    looks entirely random), but the kernel eventually *does* actually go 
>    into read-ahead mode pretty soon simply because once it gets into the 
>    binary search thing, the data entries are close enough to be in 
>    adjacent pages, and it all looks ok.

Did you try with version 2 of the pack index?  Because it should have 
somewhat better locality as the object SHA1 and their offset are split 
into separate tables.

> That said, I think there's something subtly wrong in our pack-file 
> sorting, and it should be more contiguous when we just do tree object 
> accesses on the top commit. I was really hoping that all the top-level 
> trees should be written entirely together, but I wonder if the "write out 
> deltas first" thing causes us to have those big gaps in between.

Tree objects aren't all together.  Related blob objects are interlaced 
with those tree objects.  But for a checkout that should actually 
correspond to a nice linear access.

And deltas aren't written first, but rather their base object.  And 
because deltas are based on newer objects, in theory the top commit 
shouldn't have any delta at all, and the second commit should have all 
the base objects for its deltas already written out a part of the first 
commit.  At least that's what a perfect data set would produce.  Last 
time I checked, there was about 20% of the deltas that happened to be in 
the other direction, i.e. the deltified object was younger than its base 
object, most probably because the new version of the file shrunk instead 
of growing which is against the assumption in the delta search 
object sort.  But again, because the base object is needed to resolve 
the delta, it will be read anyway.

Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html