Re: [RFC] Add --create-cache to repack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 29 Jan 2011, Junio C Hamano wrote:

> Shawn Pearce <spearce@xxxxxxxxxxx> writes:
> 
> > I fully implemented the reuse of a cached pack behind a thin pack idea
> > I was trying to describe in this thread.  It saved 1m7s off the JGit
> > running time, but increased the data transfer by 25 MiB.  I didn't
> > expect this much of an increase, I honestly expected the thin pack
> > portion to be well, thinner.  The issue is the thin pack cannot delta
> > against all of the history, its only delta compressing against the tip
> > of the cached pack.  So long-lived side branches that forked off an
> > older part of the history aren't delta compressing well, or at all,
> > and that is significantly bloating the thin pack.  (Its also why that
> > "newer" pack is 57M, but should be 14M if correctly combined with the
> > cached pack.)  If I were to consider all of the objects in the cached
> > pack as potential delta base candidates for the thin pack, the entire
> > benefit of the cached pack disappears.
> 
> What if you instead use the cached pack this way?
> 
>  0. You perform the proposed pre-traversal until you hit the tip of cached
>     pack(s), and realize that you will end up sending everything.
> 
>  1. Instead of sending the new part of the history first and then sending
>     the cached pack(s), you send the contents of cached pack(s), but also
>     note what objects you sent;
> 
>  2. Then you send the new part of the history, taking full advantage of
>     what you have already sent, perhaps doing only half of the reuse-delta
>     logic (i.e. you reuse what you can reuse, but you do _not_ punt on an
>     object that is not a delta in an existing pack).

The problem is to determine the best base object to delta against.  If 
you end up listing all the already sent objects and perform delta 
attempts against them for the remaining non delta objects to find the 
best match then you might end up taking more CPU time than the current 
enumeration phase.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]