Re: Git performance results on a large repository

david@xxxxxxx · Mon, 6 Feb 2012 17:28:06 -0800 (PST)

On Mon, 6 Feb 2012, Joshua Redstone wrote:

David Lang and David Barr, I generated the pack files by doing a repack:
"git repack -a -d -f --max-pack-size=10g --depth=100 --window=250"  after
generating the repo.

how many pack files does this end up creating?

I think that doing a full repack the way you did will group all revisions 
of a given file into a pack.

while what I'm saying is that if you create the packs based on time, 
rather than space efficiency of the resulting pack files, you may end up 
not having to go through as much date when doing things like a git blame.

what you did was

initialize repo
4M commits
repack

what I'm saying is

initialize repo
loop
   500K commits
   repack (and set pack to .keep so it doesn't get overwritten)

so you will end up with ~8 sets of pack files, but time based so that when 
you only need recent information you only look at the most recent pack 
file. If you need to go back through all time, the multiple pack files 
will be a little more expensive to process.

this has the added advantage that the 8 small repacks should be cheaper 
than the one large repack as it isn't trying to cover all commits each 
time.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html