On Thu, 14 Aug 2008, Nicolas Pitre wrote: > > Possible. However, the fact that both the "Compressing objects" and the > "Writing objects" phases during a repack (without -f) together are > _faster_ than the "Counting objects" phase is a sign that something is > more significant than cache misses here, especially when tree > information is a small portion of the total pack data size. Hmm. I think I may have clue. The size of the delta cache seems to be a sensitive parameter for this thing. Not so much for the git archive, but working on the kernel tree, raising it to 1024 seems to give a 20% performance improvement. That, in turn, implies that we may be unpacking things over and over again because of bad locality wrt delta generation. I'm not sure how easy something like that is to fix, though. We generate the object list in "recency" order for a reason, but that also happens to be the worst possible order for re-using the delta cache - by the time we get back to the next version of some tree entry, we'll have cycled through all the other trees, and blown all the caches, so we'll end up likely re-doing the whole delta chain. So it's quite possible that what ends up happening is that some directory with a deep delta chain will basically end up unpacking the whole chain - which obviously includes inflating each delta - over and over again. That's what the delta cache was supposed to avoid.. Looking at some call graphs, for the kernel I get: - process_tree() called 10 million times - causing parse_tree() called 479,466 times (whew, so 19 out of 20 trees have already been seen and can be discarded) - which in turn calls read_sha1_file() (total: 588,110 times, but there's a hundred thousand+ commits) but that actually causes - 588,110 cals to cache_or_unpack_entry out of which 5,850 calls hit in the cache, and 582,260 do *not*. IOW, the delta cache effectively never triggers because the working set is _way_ bigger than the cache, and the patterns aren't good. So since most trees are deltas, and the max delta depth is 10, the average depth is soemthing like 5, and we actually get an ugly - 1,637,999 calls to unpack_compressed_entry which all results in a zlib inflate call. So we actually have three times as many calls to inflate as we even have objects parsed, due to the delta chains on the trees (the commits almost never delta-chain at all, much less any deeper than a couple of entries). So yeah, trees are the problem here, and yes, avoiding inflating them would help - but mainly because we do it something like four times per object on average! Ouch. But we really can't just make the cache bigger, and the bad access patterns really are on purpose here. The delta cache was not meant for this, it was really meant for the "dig deeper into the history of a single file" kind of situation that gets very different patterns indeed. I'll see if I can think of anything simple to avoid all this unnecessary work. But it doesn't look too good. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html