On Thu, 14 Aug 2008, Linus Torvalds wrote: > Here's a hint: the cost of a cache miss is generally about a hundred times > the cost of just about anything else. > > So to make a convincing argument, you'd have to show that the actual > memory access patterns are also much better. > > No, zlib isn't perfect, and nope, inflate_fast() is no "memcpy()". And > yes, I'm sure a pure memcpy would be much faster. But I seriously suspect > that a lot of the cost is literally in bringing in the source data to the > CPU. Because we just mmap() the whole pack-file, the first access to the > data is going to see the cost of the cache misses. Possible. However, the fact that both the "Compressing objects" and the "Writing objects" phases during a repack (without -f) together are _faster_ than the "Counting objects" phase is a sign that something is more significant than cache misses here, especially when tree information is a small portion of the total pack data size. Of course we can do further profiling, say with core.compression set to 0 and a full repack, or even hacking the pack-objects code to force a compression level of 0 for tree objects, and possibly commits too since pack v4 intend to deflate only the log text). Tree objects delta very well, but they don't deflate well at all. OK, so I did, and the quick test for the kernel is: |nico@xanadu:linux-2.6> time git rev-list --all --objects > /dev/null | |real 0m14.737s |user 0m14.432s |sys 0m0.296s That's for 1031404 objects, hence we're now talking around 70k objects/sec instead of 48k objects/sec. _Only_ by removing zlib out of the equation despite the fact that the pack is now larger. So I bet that additional improvements from pack v4 could improve things even more, including the object lookup avoidance optimization I mentioned previously. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html