Re: [RFC] repack vs re-clone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 10 Feb 2008, Marco Costalba wrote:

> Sometime I found myself re-cloning entirely a repository, as example
> the Linux tree, instead of repackaging my local copy.
> 
> The reason is that the published Linux repository is super compressed
> and to reach the same level of compression on my local copy I would
> need to give my laptop a long night running.

No.  I really doubt the public Linux repository is compressed with 
anything but the default repack settings.

And on my average PC by today's standards (P4 @ 3GHz with 1GB memory), 
repacking the Linux repo takes less than 6.5 minutes, and peak RSS is 
around 450MB.

> So it happens to be just faster to re-clone the whole thing by upstream.

Only if you're lucky to have a fast connection to the net with a high 
transfer quota.

> Also repackaging a big repo in the optimal way is not so trivial, you
> need to understand quite advanced stuff like window depth and so on
> and probably the pack parameters used upstream are easily better then
> what you could 'guess' trying yourself. Or simply you don't have
> enough RAM as would be needed.

If such is your case, why would you fully repack your repo in the first 
place?  Simply running 'git gc' should be quite good enough for people 
uninterested in the "advanced" stuff.  The repack that 'git gc' uses 
will happily reuse existing packed data from upstream.



> On the other end it would be interesting to know, before to start the
> new clone, what is the real advantage of this, i.e. what is the
> repository size upstream.

You can already query the remote repository directory listing and figure 
it out.  For example:

lftp -c 'open http://kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/objects/pack && ls'

And you'll note that even upstream isn't always fully packed in advance.

> So I would like to ask if anyone would consider useful:
> 
> - A command like 'git info' or something like that that prints size of
> local and upstream repository (among possibly other things)
> 
> - An option like 'git repack --clone' to instruct git to download and
> use current upstream packs instead of trying to recreate new ones.

I think that would be a very bad idea.  Not only this is rather 
unnecessary (either you can aford to repack locally, or you live with 
the upstream provided packing and repack incrementally which is pretty 
good enough), but it is also really bad resource wise (that'll end up 
only wasting net bandwitdh and CPU cycles on the server).


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux