Re: git gc --aggressive led to about 40 times slower "git log --raw"

David Kastrup <dak@xxxxxxx> · Thu, 20 Feb 2014 17:48:21 +0100

Duy Nguyen <pclouds@xxxxxxxxx> writes:

> I can think of two improvements we could make, either increase cache
> size dynamically (within limits) or make it configurable. If we have N
> entries in worktree (both trees and blobs) and depth M, then we might
> need to cache N*M objects for it to be effective. Christian, if you
> want to experiment this, update MAX_DELTA_CACHE in sha1_file.c and
> rebuild.

Well, my optimized "git-blame" code is considerably hit by an
aggressively packed Emacs repository so I took a look at it with the
MAX_DELTA_CACHE value set to the default 256, and then 512, 1024, 2048.

Here are the results:

dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real	1m17.496s
user	0m30.552s
sys	0m46.496s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real	1m13.888s
user	0m30.060s
sys	0m43.420s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real	1m16.415s
user	0m31.436s
sys	0m44.564s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real	1m24.732s
user	0m34.416s
sys	0m49.808s

So using a value of 512 helps a bit (7% or so), but further increases
already cause a hit.  My machine has 4G of memory (32bit x86), so it is
unlikely that memory is running out.  I have no idea why this would be
so: either memory locality plays a role here, or the cache for some
reason gets reinitialized or scanned/copied/accessed as a whole
repeatedly, defeating the idea of a cache.  Or the access pattern are
such that it's entirely useless as a cache even at this size.

Trying with 16384:
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real	2m8.000s
user	0m54.968s
sys	1m12.624s

And memory consumption did not exceed about 200m all the while, so is
far lower than what would have been available.

Something's _really_ fishy about that cache behavior.  Note that the
_system_ time goes up considerably, not just user time.  Since the packs
are zlib-packed, it's reasonable that more I/O time is also associated
with more user time and it is well possible that the user time increase
is entirely explainable by the larger amount of compressed data to
access.

But this stinks.  I doubt that the additional time is spent in memory
allocation: most of that would register only as user time.  And the
total allocated memory is not large enough that one can explain this
away with fewer available disk buffers for the kernel: the aggressively
packed repo takes about 300m so it would fine into memory together with
the git process.

-- 
David Kastrup
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html