On Tue, 16 Sep 2008, Scott Chacon wrote: > I was wondering if it would be of general interest to have upload-pack > take an option to cache packfiles. Unless I am mistaken, every clone > on a git server will recreate the same packfile until something new is > pushed into it, correct? I thought it might be a good idea to pass an > option to have it cache the packfile that is created if > create_full_pack is set and re-use it until the repository is updated. > If I patched upload-pack to do this, would there be any interest in > it? Well, if you do that there are a few things to be careful about. First, having a server process able to write files is a security hazard. If you want to create a pack cache then it is best if created manually by the repository owner. You don't want someone cloning a repository actually messing with such cache. Secondly, the dynamic creation of a pack currently take into account the capabilities of the client so not to produce a pack with features that the client does not support. So in order not to have to cache pack with many feature combinations, this cache should probably only take effect if pack capabilities of the server are also supported by the client. Now, the _only_ advantage of a cached pack file is in avoiding execution of rev-list. Otherwise creation of a pack for streaming is almost identical to straight copying of data from disk due to pack data reuse. The rev-list can be made faster by having the pack-objects process do the object listing itself instead of piping the output from rev-list into it ('git repack' does that but 'git-upload-pack' doesn't). And I believe that rev-list could be made much much faster with pack v4. That been said... What you could have is a simple file with 2 SHA1s: the first corresponding to the output of 'git for-each-ref' and the second one corresponding to the list of all objects reachable from those refs. For example: 1) git for-each-ref --format="%(objectname)" --sort=objectname | sha1sum 2) git for-each-ref --format="%(objectname)" | \ xargs git rev-list --objects | cut -c -40 | sort | sha1sum So, if you do the above in a freshly cloned repository, you'll find that the SHA1 in 2) corresponds to this: 3) git show-index < .git/objects/pack/pack-*.idx | cut -f2 -d' ' | sha1sum which means that all objects reachable from all refs are found in the only pack you have. Now, if the SHA1 in 2) is computed over the binary representation of all those object names, you'll find out that it corresponds to the actual pack name in the .git/objects/pack/ directory. So what upload-pack could do is look for a special file with those 2 SHA1s, and if it exists then see if the first SHA1 matches the list of values for all refs, if so then the name of the pack to send out corresponds to the second SHA1. If that pack is found in the repository then you just have to stream it out. Creating that file is then just a matter of doing the equivalent of the above commands and repacking your repository into a single pack. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html