Re: thin packs ending up fat

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 12 Jan 2012, Jeff King wrote:

> On Thu, Jan 12, 2012 at 05:15:23PM -0500, Jeff King wrote:
> 
> > It turns out that when packing a subset of a fully packed repo (as we do
> > for a bundle or for a fetch), we tend not to make thin packs at all.
> > The culprit is this logic in try_delta:
> > 
> >         /*
> >          * We do not bother to try a delta that we discarded
> >          * on an earlier try, but only when reusing delta data.
> >          */
> >         if (reuse_delta && trg_entry->in_pack &&
> >             trg_entry->in_pack == src_entry->in_pack &&
> >             trg_entry->in_pack_type != OBJ_REF_DELTA &&
> >             trg_entry->in_pack_type != OBJ_OFS_DELTA)
> >                 return 0;
> > [...]
> > Maybe it is enough to simply turn off this optimization if the potential
> > delta source is not being included in the pack (i.e., we are using
> > --thin and it is a boundary object). Because if both objects are being
> > sent, we will just end up reusing the delta that goes in the reverse
> > direction anyway.
> 
> Hmm. It turns out this is really easy, because we have already marked
> such objects as preferred bases.

That's exactly what I was about to suggest after reading your first 
email.

> So with this patch:
> 
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 96c1680..d05e228 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1439,6 +1439,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
>  	 */
>  	if (reuse_delta && trg_entry->in_pack &&
>  	    trg_entry->in_pack == src_entry->in_pack &&
> +	    !src_entry->preferred_base &&
>  	    trg_entry->in_pack_type != OBJ_REF_DELTA &&
>  	    trg_entry->in_pack_type != OBJ_OFS_DELTA)
>  		return 0;

Acked-by: Nicolas Pitre <nico@xxxxxxxxxxx>

> here are the numbers I get:
> 
>                   dataset
>             | fetches | tags
> ---------------------------------
>      before | 53358   | 2750977
> size  after | 32398   | 2668479
>      change |   -39%  |      -3%
> ---------------------------------
>      before |  0.18   | 1.12
> CPU   after |  0.18   | 1.15
>      change |    +0%  |      +3%
> 
> So nearly all of the size benefit, but very little CPU change (even the
> 3% on the larger-pack case is close to the levels of run-to-run noise).
> Obviously the size benefit in the larger-pack case isn't impressive, but
> I think the "fetches" case is much more indicative of a real server
> load.

Indeed.  Please make sure to capture those numbers in the commit log.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]