On Sun, 10 Feb 2008, Marco Costalba wrote: > Sometime I found myself re-cloning entirely a repository, as example > the Linux tree, instead of repackaging my local copy. > > The reason is that the published Linux repository is super compressed > and to reach the same level of compression on my local copy I would > need to give my laptop a long night running. No. I really doubt the public Linux repository is compressed with anything but the default repack settings. And on my average PC by today's standards (P4 @ 3GHz with 1GB memory), repacking the Linux repo takes less than 6.5 minutes, and peak RSS is around 450MB. > So it happens to be just faster to re-clone the whole thing by upstream. Only if you're lucky to have a fast connection to the net with a high transfer quota. > Also repackaging a big repo in the optimal way is not so trivial, you > need to understand quite advanced stuff like window depth and so on > and probably the pack parameters used upstream are easily better then > what you could 'guess' trying yourself. Or simply you don't have > enough RAM as would be needed. If such is your case, why would you fully repack your repo in the first place? Simply running 'git gc' should be quite good enough for people uninterested in the "advanced" stuff. The repack that 'git gc' uses will happily reuse existing packed data from upstream. > On the other end it would be interesting to know, before to start the > new clone, what is the real advantage of this, i.e. what is the > repository size upstream. You can already query the remote repository directory listing and figure it out. For example: lftp -c 'open http://kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/objects/pack && ls' And you'll note that even upstream isn't always fully packed in advance. > So I would like to ask if anyone would consider useful: > > - A command like 'git info' or something like that that prints size of > local and upstream repository (among possibly other things) > > - An option like 'git repack --clone' to instruct git to download and > use current upstream packs instead of trying to recreate new ones. I think that would be a very bad idea. Not only this is rather unnecessary (either you can aford to repack locally, or you live with the upstream provided packing and repack incrementally which is pretty good enough), but it is also really bad resource wise (that'll end up only wasting net bandwitdh and CPU cycles on the server). Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html