Re: [RFC] Add --create-cache to repack

Shawn Pearce <spearce@xxxxxxxxxxx> · Fri, 28 Jan 2011 18:34:13 -0800

On Fri, Jan 28, 2011 at 17:32, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>
> Well, scratch the idea in this thread.  I think.
>
> I retested JGit vs. CGit on an identical linux-2.6 repository.  The
> repository was fully packed, but had two pack files.  362M and 57M,
> and was created by packing a 1 month old master, marking it .keep, and
> then repacking -a -d to get most recent last month into another pack.
> This results in some files that should be delta compressed together
> being stored whole in the two packs (obviously).
>
> The two implementations take the same amount of time to generate the
> clone.  3m28s / 3m22s for JGit, 3m23s for C Git.  The JGit created
> pack is actually smaller 376.30 MiB vs. C Git's 380.59 MiB.

I just tried caching only the object list of what is reachable from a
particular commit.  The file is a small 20 byte header:

  4 byte magic
  4 byte version
  4 byte number of commits (C)
  4 byte number of trees (T)
  4 byte number of blobs (B)

Then C commit SHA-1s, followed by T tree SHA-1 + 4 byte path_hash,
followed by B blob SHA-1 + 4 byte path_hash.  For any project the size
is basically on par with the .idx file for the pack v1 format, so ~41
MB for linux-2.6.  The file is stored as
$GIT_OBJECT_DIRECTORY/cache/$COMMIT_SHA1.list, and is completely
pack-independent.

Using this for object enumeration shaves almost 1 minute off server
packing time; the clone dropped from 3m28s to 2m29s.  That is close to
what I was getting with the cached pack idea, but the network transfer
stayed the small 376 MiB.  I think this supports your pack v4 work...
if we can speed up object enumeration to be this simple (scan down a
list of objects with their types declared inline, or implied by
location), we can cut a full minute of CPU time off the server side.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html