Re: pack operation is thrashing my server

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 14 Aug 2008 16:14:26 -0700 (PDT)

On Thu, 14 Aug 2008, Nicolas Pitre wrote:
> 
> Possible.  However, the fact that both the "Compressing objects" and the 
> "Writing objects" phases during a repack (without -f) together are 
> _faster_ than the "Counting objects" phase is a sign that something is 
> more significant than cache misses here, especially when tree 
> information is a small portion of the total pack data size.

Hmm. I think I may have clue.

The size of the delta cache seems to be a sensitive parameter for this 
thing. Not so much for the git archive, but working on the kernel tree, 
raising it to 1024 seems to give a 20% performance improvement. That, in 
turn, implies that we may be unpacking things over and over again because 
of bad locality wrt delta generation. 

I'm not sure how easy something like that is to fix, though. We generate 
the object list in "recency" order for a reason, but that also happens to 
be the worst possible order for re-using the delta cache - by the time we 
get back to the next version of some tree entry, we'll have cycled through 
all the other trees, and blown all the caches, so we'll end up likely 
re-doing the whole delta chain.

So it's quite possible that what ends up happening is that some directory 
with a deep delta chain will basically end up unpacking the whole chain - 
which obviously includes inflating each delta - over and over again.

That's what the delta cache was supposed to avoid..

Looking at some call graphs, for the kernel I get:

 - process_tree() called 10 million times

 - causing parse_tree() called 479,466 times (whew, so 19 out of 20 trees 
   have already been seen and can be discarded)

 - which in turn calls read_sha1_file() (total: 588,110 times, but there's 
   a hundred thousand+ commits)

but that actually causes 

 - 588,110 cals to cache_or_unpack_entry

out of which 5,850 calls hit in the cache, and 582,260 do *not*.

IOW, the delta cache effectively never triggers because the working set is 
_way_ bigger than the cache, and the patterns aren't good. So since most 
trees are deltas, and the max delta depth is 10, the average depth is 
soemthing like 5, and we actually get an ugly

 - 1,637,999 calls to unpack_compressed_entry

which all results in a zlib inflate call.

So we actually have three times as many calls to inflate as we even have 
objects parsed, due to the delta chains on the trees (the commits almost 
never delta-chain at all, much less any deeper than a couple of entries).

So yeah, trees are the problem here, and yes, avoiding inflating them 
would help - but mainly because we do it something like four times per 
object on average!

Ouch. But we really can't just make the cache bigger, and the bad access 
patterns really are on purpose here. The delta cache was not meant for 
this, it was really meant for the "dig deeper into the history of a single 
file" kind of situation that gets very different patterns indeed.

I'll see if I can think of anything simple to avoid all this unnecessary 
work. But it doesn't look too good.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html