Re: Optimizing cloning of a high object count repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 13 Dec 2008, Nicolas Pitre wrote:

> On Sat, 13 Dec 2008, Resul Cetin wrote:
> 
> > On Saturday 13 December 2008 16:46:50 you wrote:
> > [...]
> > > >  The size of the linux repository seems to be smaller but in the same
> > > > range object count and repository size but clones are much much faster.
> > > > Is there any way to optimize the server operations like counting and
> > > > compressing of objects to get the same speed as we get from
> > > > git.kernel.org (which does it in nearly no time and the only limiting
> > > > factor seems to be my bandwith)?
> > > >  The only other information I have is that Robin H. Johnson made a single
> > > >  ~910MiB pack for the whole repository.
> > >
> > > Make yearly packed repository snapshots and publish them via http.
> > > People can wget the latest snapshot, then pull updates later.
> > That would be a workaround but it doesn't explain why git.kernel.org deliveres 
> > torvalds repository without any notable counting and compressing time. Maybe 
> > it has something todo with the config I found inside the repository:
> > http://git.overlays.gentoo.org/gitroot/exp/gentoo-x86.git/config
> > It says that it isnt a bare repository.
> 
> That's not relevant.
> 
> The counting time is a bit unfortunate (although I have plans to speed 
> that up, if only I can find the time).
> 
> You should be able to skip the compression time entirely though, if you 
> do repack the repository first.  And you want it to be as tightly packed 
> as possible for public access.  I'm currently cloning it and the 
> counting phase is not _that_ bad compared to the compression phase.  Try 
> something like 'git repack -a -f -d --window=200' and let it run 
> overnight if necessary.  You need to do this only once, and preferably 
> on a machine with lots of RAM, and preferably on a 64-bit machine.  Once 
> this is done then things should go much more smoothly afterwards.

FYI, I repacked that repository after cloning it, and that operation 
required around 2.5G of resident memory.  Given the address space 
fragmentation, it is possible that a full repack cannot be performed on 
a 32-bit machine.

I did 'git repack -a -f -d --window=500 --depth=100'.  This took less 
than an hour on a quad core machine.  The resulting pack is 695MB in 
size.  That's the amount of data that would be transfered during a 
clone of this repository, and nothing would have to be compressed during 
the clone as everything is already fully compressed.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux