Re: Performance issue: initial git clone causes massive repack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nicolas Sebrecht <nicolas.s-dev@xxxxxxxxxxx> wrote:
> On Sun, Apr 05, 2009 at 12:04:12AM -0700, Robin H. Johnson wrote:
> >          The GSoC 2009 ideas contain a potential project for caching the
> > generated packs, which, while having value in itself, could be partially
> > avoided by sending suitable pre-built packs (if they exist) without any
> > repacking.
> 
> Right. It could be an option to wait and see if the GSoC gives
> something.

Another option is to use rsync:// for initial clones.
 
Tell new developers that their initial command sequence to
(efficiently) get the base tree is:

  git clone rsync://git.gentoo.org/tree.git
  cd tree
  git config remote.origin.url git://git.gentoo.org/tree.git

rsync should be more efficient at dragging 1.6GiB over the network,
as its only streaming the files.  But it may fall over if the server
has a lot of loose objects; many more small files to create.

One way around that would be to use two repositories on the server;
a historical repository that is fully packed and contains the full
history, and a bleeding edge repository that users would normally
work against:

  git clone rsync://git.gentoo.org/fully-packed-tree.git tree
  cd tree
  git config remote.origin.url git://git.gentoo.org/tree.git
  git pull

Then every so often (e.g. once a Gentoo release cycle, so once
a year) pull the bleeding edge repository into the fully packed
repository.  That will introduce a single new pack file, so the
fully packed repository grows at a rate of 2 inodes/year, and is
still very efficient to rsync on initial clones.


That caching GSoC project may help, but didn't I see earlier in
this thread that you have >4.8 million objects in your repository?
Any proposals on that project would still have Git malloc()'ing
data per object; its ~80 bytes per object needed so that's a data
segment of 384+ MiB, per concurrent clone client.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux