On Tuesday, August 06, 2013 06:24:50 am Duy Nguyen wrote: > On Tue, Aug 6, 2013 at 9:38 AM, Ramkumar Ramachandra <artagnon@xxxxxxxxx> wrote: > > + Garbage collect using a pseudo > > logarithmic packfile maintenance + > > approach. This approach attempts to minimize packfile > > churn + by keeping several generations > > of varying sized packfiles around + and > > only consolidating packfiles (or loose objects) which > > are + either new packfiles, or packfiles > > close to the same size as + another > > packfile. > > I wonder if a simpler approach may be nearly efficient as > this one: keep the largest pack out, repack the rest at > fetch/push time so there are at most 2 packs at a time. > Or we we could do the repack at 'gc --auto' time, but > with lower pack threshold (about 10 or so). When the > second pack is as big as, say half the size of the > first, merge them into one at "gc --auto" time. This can > be easily implemented in git-repack.sh. It would definitely be better than the current gc approach. However, I suspect it is still at least one to two orders of magnitude off from where it should be. To give you a real world example, on our server today when gitexproll ran on our kernel/msm repo, it consolidated 317 pack files into one almost 8M packfile (it compresses/dedupes shockingly well, one of those new packs was 33M). Our largest packfile in that repo is 1.5G! So let's now imagine that the second closest packfile is only 100M, it would keep getting consolidated with 8M worth of data every day (assuming the same conditions and no extra compression). That would take (750M-100M)/8M ~ 81 days to finally build up large enough to no longer consolidate the new packs with the second largest pack file daily. During those 80+ days, it will be on average writing 325M too much per day (when it should be writing just 8M). So I can see the appeal of a simple solution, unfortunately I think one layer would still "suck" though. And if you are going to add even just one extra layer, I suspect that you might as well go the full distance since you probably already need to implement the logic to do so? -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html