Re: Performance issue: initial git clone causes massive repack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Apr 05, 2009 at 12:17:03PM -0700, Shawn O. Pearce wrote:
> Another option is to use rsync:// for initial clones.
>   git clone rsync://git.gentoo.org/tree.git
> rsync should be more efficient at dragging 1.6GiB over the network,
> as its only streaming the files.  But it may fall over if the server
> has a lot of loose objects; many more small files to create.
I just tried this, and ran into a segfault.

Original command:
# git clone rsync://git.overlays.gentoo.org/vcs-public-gitroot/exp/gentoo-x86.git

It looks at a glance like the linked list has a null value it hits during the
internal while loop, not checking 'list' before using 'list->next'.

gdb> bt
#0  strcmp () at ../sysdeps/x86_64/strcmp.S:30
#1  0x000000000049474c in get_refs_via_rsync (transport=<value optimized out>, for_push=<value optimized out>) at transport.c:123
#2  0x000000000049234c in transport_get_remote_refs (transport=0x725fc9) at transport.c:1045
#3  0x000000000041620a in cmd_clone (argc=<value optimized out>, argv=0x7fff908c8550, prefix=<value optimized out>) at builtin-clone.c:487
#4  0x0000000000404f59 in handle_internal_command (argc=0x2, argv=0x7fff908c8550) at git.c:244
#5  0x0000000000405167 in main (argc=0x2, argv=0x7fff908c8550) at git.c:434
gdb> up
#1  0x000000000049474c in get_refs_via_rsync (transport=<value optimized out>, for_push=<value optimized out>) at transport.c:123
123					(cmp = strcmp(buffer + 41,
gdb> print list
$1 = {nr = 0x0, alloc = 0x0, name = 0x0}

If I go into the repo thereafter and manually run git-fetch again, it does work
fine.

> One way around that would be to use two repositories on the server;
> a historical repository that is fully packed and contains the full
> history, and a bleeding edge repository that users would normally
> work against:
Yup, we've been considering similar. We do have one specific need with that
however: to prevent resource abuse, we would like to DENY the ability to do the
initial clone with git:// then - just so that nobody tries to DoS our servers
by doing a couple of hungry initial clones at once.

> That caching GSoC project may help, but didn't I see earlier in
> this thread that you have >4.8 million objects in your repository?
> Any proposals on that project would still have Git malloc()'ing
> data per object; its ~80 bytes per object needed so that's a data
> segment of 384+ MiB, per concurrent clone client.
384MiB or even 512MiB I can cover. It's the 200+ wallclock minutes of cpu burn
with no download that aren't acceptable.

P.S.
The -v output of the rsync-mode git-fetch is very devoid of output. Can we
maybe pipe the rsync progress back?


-- 
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail     : robbat2@xxxxxxxxxx
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

Attachment: pgpAotI8f5Ybg.pgp
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]