On Fri, Jan 28, 2011 at 17:32, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: > > Well, scratch the idea in this thread. I think. > > I retested JGit vs. CGit on an identical linux-2.6 repository. The > repository was fully packed, but had two pack files. 362M and 57M, > and was created by packing a 1 month old master, marking it .keep, and > then repacking -a -d to get most recent last month into another pack. > This results in some files that should be delta compressed together > being stored whole in the two packs (obviously). > > The two implementations take the same amount of time to generate the > clone. 3m28s / 3m22s for JGit, 3m23s for C Git. The JGit created > pack is actually smaller 376.30 MiB vs. C Git's 380.59 MiB. I just tried caching only the object list of what is reachable from a particular commit. The file is a small 20 byte header: 4 byte magic 4 byte version 4 byte number of commits (C) 4 byte number of trees (T) 4 byte number of blobs (B) Then C commit SHA-1s, followed by T tree SHA-1 + 4 byte path_hash, followed by B blob SHA-1 + 4 byte path_hash. For any project the size is basically on par with the .idx file for the pack v1 format, so ~41 MB for linux-2.6. The file is stored as $GIT_OBJECT_DIRECTORY/cache/$COMMIT_SHA1.list, and is completely pack-independent. Using this for object enumeration shaves almost 1 minute off server packing time; the clone dropped from 3m28s to 2m29s. That is close to what I was getting with the cached pack idea, but the network transfer stayed the small 376 MiB. I think this supports your pack v4 work... if we can speed up object enumeration to be this simple (scan down a list of objects with their types declared inline, or implied by location), we can cut a full minute of CPU time off the server side. -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html