Re: [PATCH] git exproll: steps to tackle gc aggression

Ramkumar Ramachandra <artagnon@xxxxxxxxx> · Wed, 7 Aug 2013 10:06:38 +0530

Martin Fick wrote:
> So, it has me wondering if there isn't a more accurate way
> to estimate the new packfile without wasting a ton of time?

I'm not sure there is. Adding the sizes of individual packs can be off
by a lot, because your deltification will be more effective if you
have more data to slide windows over and compress. For the purposes of
illustration, take a simple example:

packfile-1 has a 30M Makefile and several tiny deltas. Total = 40M.
packfile-2 has a 31M Makefile.um and several tiny deltas. Total = 40M.

Now, what is the size of packfile-3 which contains the contents of
both packfile-1 and packfile-2? 80M is a bad estimate, because you can
store deltas against just one Makefile.

So, unless you do an in-depth analysis of the objects in the packfiles
(which can be terribly expensive), I don't see how you can arrive at a
better estimate.

> If not, one approach which might be worth experimenting with
> is to just assume that new packfiles have size 0!  Then just
> consolidate them with any other packfile which is ready for
> consolidation, or if none are ready, with the smallest
> packfile.  I would not be surprised to see this work on
> average better than the current summation,

That is assuming that all fetches (or pushes) are small, which is
probably a good rule; you might like to have a "smallness threshold",
although I haven't thought hard about the problem.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html