Re: [RFC] Add --create-cache to repack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/30/2011 12:14 PM, Nicolas Pitre wrote:
On Sat, 29 Jan 2011, Junio C Hamano wrote:

Shawn Pearce<spearce@xxxxxxxxxxx>  writes:

I fully implemented the reuse of a cached pack behind a thin pack idea
I was trying to describe in this thread.  It saved 1m7s off the JGit
running time, but increased the data transfer by 25 MiB.  I didn't
expect this much of an increase, I honestly expected the thin pack
portion to be well, thinner.  The issue is the thin pack cannot delta
against all of the history, its only delta compressing against the tip
of the cached pack.  So long-lived side branches that forked off an
older part of the history aren't delta compressing well, or at all,
and that is significantly bloating the thin pack.  (Its also why that
"newer" pack is 57M, but should be 14M if correctly combined with the
cached pack.)  If I were to consider all of the objects in the cached
pack as potential delta base candidates for the thin pack, the entire
benefit of the cached pack disappears.

What if you instead use the cached pack this way?

  0. You perform the proposed pre-traversal until you hit the tip of cached
     pack(s), and realize that you will end up sending everything.

  1. Instead of sending the new part of the history first and then sending
     the cached pack(s), you send the contents of cached pack(s), but also
     note what objects you sent;

  2. Then you send the new part of the history, taking full advantage of
     what you have already sent, perhaps doing only half of the reuse-delta
     logic (i.e. you reuse what you can reuse, but you do _not_ punt on an
     object that is not a delta in an existing pack).

The problem is to determine the best base object to delta against.  If
you end up listing all the already sent objects and perform delta
attempts against them for the remaining non delta objects to find the
best match then you might end up taking more CPU time than the current
enumeration phase.

Why worry about best here? Just add the object (or one of the objects) with the same path from the commit you found in step 0, above, to the delta base search for each object to pack.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]