On Wed, Aug 7, 2013 at 7:10 AM, Martin Fick <mfick@xxxxxxxxxxxxxx> wrote: >> I wonder if a simpler approach may be nearly efficient as >> this one: keep the largest pack out, repack the rest at >> fetch/push time so there are at most 2 packs at a time. >> Or we we could do the repack at 'gc --auto' time, but >> with lower pack threshold (about 10 or so). When the >> second pack is as big as, say half the size of the >> first, merge them into one at "gc --auto" time. This can >> be easily implemented in git-repack.sh. > > It would definitely be better than the current gc approach. > > However, I suspect it is still at least one to two orders of > magnitude off from where it should be. To give you a real > world example, on our server today when gitexproll ran on > our kernel/msm repo, it consolidated 317 pack files into one > almost 8M packfile (it compresses/dedupes shockingly well, > one of those new packs was 33M). Our largest packfile in > that repo is 1.5G! > > So let's now imagine that the second closest packfile is > only 100M, it would keep getting consolidated with 8M worth > of data every day (assuming the same conditions and no extra > compression). That would take (750M-100M)/8M ~ 81 days to > finally build up large enough to no longer consolidate the > new packs with the second largest pack file daily. During > those 80+ days, it will be on average writing 325M too much > per day (when it should be writing just 8M). > > So I can see the appeal of a simple solution, unfortunately > I think one layer would still "suck" though. And if you are > going to add even just one extra layer, I suspect that you > might as well go the full distance since you probably > already need to implement the logic to do so? I see. It looks like your way is the best way to go. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html