Re: pack operation is thrashing my server

Nicolas Pitre <nico@xxxxxxx> · Thu, 14 Aug 2008 17:50:18 -0400 (EDT)

On Thu, 14 Aug 2008, Linus Torvalds wrote:

> Here's a hint: the cost of a cache miss is generally about a hundred times 
> the cost of just about anything else. 
> 
> So to make a convincing argument, you'd have to show that the actual 
> memory access patterns are also much better.
> 
> No, zlib isn't perfect, and nope, inflate_fast() is no "memcpy()". And 
> yes, I'm sure a pure memcpy would be much faster. But I seriously suspect 
> that a lot of the cost is literally in bringing in the source data to the 
> CPU. Because we just mmap() the whole pack-file, the first access to the 
> data is going to see the cost of the cache misses.

Possible.  However, the fact that both the "Compressing objects" and the 
"Writing objects" phases during a repack (without -f) together are 
_faster_ than the "Counting objects" phase is a sign that something is 
more significant than cache misses here, especially when tree 
information is a small portion of the total pack data size.

Of course we can do further profiling, say with core.compression set to 
0 and a full repack, or even hacking the pack-objects code to force a 
compression level of 0 for tree objects, and possibly commits too since 
pack v4 intend to deflate only the log text).  Tree objects delta very 
well, but they don't deflate well at all.

OK, so I did, and the quick test for the kernel is:

|nico@xanadu:linux-2.6> time git rev-list --all --objects > /dev/null
|
|real    0m14.737s
|user    0m14.432s
|sys     0m0.296s

That's for 1031404 objects, hence we're now talking around 70k 
objects/sec instead of 48k objects/sec.  _Only_ by removing zlib out of 
the equation despite the fact that the pack is now larger.  So I bet 
that additional improvements from pack v4 could improve things even 
more, including the object lookup avoidance optimization I mentioned 
previously.

Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html