On Thu, 12 Jan 2012, Jeff King wrote: > On Thu, Jan 12, 2012 at 05:15:23PM -0500, Jeff King wrote: > > > It turns out that when packing a subset of a fully packed repo (as we do > > for a bundle or for a fetch), we tend not to make thin packs at all. > > The culprit is this logic in try_delta: > > > > /* > > * We do not bother to try a delta that we discarded > > * on an earlier try, but only when reusing delta data. > > */ > > if (reuse_delta && trg_entry->in_pack && > > trg_entry->in_pack == src_entry->in_pack && > > trg_entry->in_pack_type != OBJ_REF_DELTA && > > trg_entry->in_pack_type != OBJ_OFS_DELTA) > > return 0; > > [...] > > Maybe it is enough to simply turn off this optimization if the potential > > delta source is not being included in the pack (i.e., we are using > > --thin and it is a boundary object). Because if both objects are being > > sent, we will just end up reusing the delta that goes in the reverse > > direction anyway. > > Hmm. It turns out this is really easy, because we have already marked > such objects as preferred bases. That's exactly what I was about to suggest after reading your first email. > So with this patch: > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 96c1680..d05e228 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -1439,6 +1439,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src, > */ > if (reuse_delta && trg_entry->in_pack && > trg_entry->in_pack == src_entry->in_pack && > + !src_entry->preferred_base && > trg_entry->in_pack_type != OBJ_REF_DELTA && > trg_entry->in_pack_type != OBJ_OFS_DELTA) > return 0; Acked-by: Nicolas Pitre <nico@xxxxxxxxxxx> > here are the numbers I get: > > dataset > | fetches | tags > --------------------------------- > before | 53358 | 2750977 > size after | 32398 | 2668479 > change | -39% | -3% > --------------------------------- > before | 0.18 | 1.12 > CPU after | 0.18 | 1.15 > change | +0% | +3% > > So nearly all of the size benefit, but very little CPU change (even the > 3% on the larger-pack case is close to the levels of run-to-run noise). > Obviously the size benefit in the larger-pack case isn't impressive, but > I think the "fetches" case is much more indicative of a real server > load. Indeed. Please make sure to capture those numbers in the commit log. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html