Re: Git performance results on a large repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 6 Feb 2012, Joshua Redstone wrote:

David Lang and David Barr, I generated the pack files by doing a repack:
"git repack -a -d -f --max-pack-size=10g --depth=100 --window=250"  after
generating the repo.

how many pack files does this end up creating?

I think that doing a full repack the way you did will group all revisions of a given file into a pack.

while what I'm saying is that if you create the packs based on time, rather than space efficiency of the resulting pack files, you may end up not having to go through as much date when doing things like a git blame.

what you did was

initialize repo
4M commits
repack

what I'm saying is

initialize repo
loop
   500K commits
   repack (and set pack to .keep so it doesn't get overwritten)

so you will end up with ~8 sets of pack files, but time based so that when you only need recent information you only look at the most recent pack file. If you need to go back through all time, the multiple pack files will be a little more expensive to process.

this has the added advantage that the 8 small repacks should be cheaper than the one large repack as it isn't trying to cover all commits each time.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]