On Wed, 28 Nov 2007, Linus Torvalds wrote: > - the index accesses are much more "random": the initial 256-way fan-out > followed by the binary search causes the access patterns to look very > different: > > 0: 28367707 > 136: 18867574 > 140: 221280 > 141: 745890 > 142: 284427 > 143: 338 > 381: 9787459 > 377: 394 > 375: 255 > 376: 248 > 3344: 29885989 > 3347: 334 > 3346: 255 > 3684: 7251911 > 1055: 12954064 > 1052: 386 > 1050: 251 > 1049: 240 > 1947: 10501455 > 1944: 382 > 1946: 262 > > where it doesn't even read-ahead at all in the beginning (because it > looks entirely random), but the kernel eventually *does* actually go > into read-ahead mode pretty soon simply because once it gets into the > binary search thing, the data entries are close enough to be in > adjacent pages, and it all looks ok. Did you try with version 2 of the pack index? Because it should have somewhat better locality as the object SHA1 and their offset are split into separate tables. > That said, I think there's something subtly wrong in our pack-file > sorting, and it should be more contiguous when we just do tree object > accesses on the top commit. I was really hoping that all the top-level > trees should be written entirely together, but I wonder if the "write out > deltas first" thing causes us to have those big gaps in between. Tree objects aren't all together. Related blob objects are interlaced with those tree objects. But for a checkout that should actually correspond to a nice linear access. And deltas aren't written first, but rather their base object. And because deltas are based on newer objects, in theory the top commit shouldn't have any delta at all, and the second commit should have all the base objects for its deltas already written out a part of the first commit. At least that's what a perfect data set would produce. Last time I checked, there was about 20% of the deltas that happened to be in the other direction, i.e. the deltified object was younger than its base object, most probably because the new version of the file shrunk instead of growing which is against the assumption in the delta search object sort. But again, because the base object is needed to resolve the delta, it will be read anyway. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html