Re: faster git clone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 21, 2021 at 7:01 PM William Chen <williamchen32335@xxxxxxxxx> wrote:
>
> Dear Emily,
>
> I see your excellent contribution to git clone. I hope that you are well.

Hi William, this is a question much better directed at the Git list as a whole.

>
> When I try to clone a repo of a large size from github, it is slow.
>
> $ git clone https://github.com/git/git
> ...
> remote: Enumerating objects: 56, done.
> remote: Counting objects: 100% (56/56), done.
> remote: Compressing objects: 100% (25/25), done.
> Receiving objects:  23% (70386/299751), 33.00 MiB | 450.00 KiB/s
>
> The following aria2c command, which can use multiple downloading threads, is much faster. Would you please let me know whether there is a way to speed up git clone (maybe by using parallelization)?

In general, it would be more compelling to see actual numbers than
"much faster", e.g. the outputs of `time git clone
https://github.com/git/git` and `time aria2c
https://github.com/git/git/archive/master.zip` - or even an estimation
from you, like, "I think clone takes a minute or two but aria does the
same thing in only a couple of seconds". "Much faster" means something
different to everyone :)

>
> Your help is much appreciated! I look forward to hearing from you. Thanks.
>
> $ aria2c https://github.com/git/git/archive/master.zip
>
> 01/21 20:16:04 [NOTICE] Downloading 1 item(s)
>
> 01/21 20:16:04 [NOTICE] CUID#7 - Redirecting to https://codeload.github.com/git/git/zip/master

Right here it looks like your zip download redirects to a CDN or
something, which is probably better optimized for serving archives
than the Git server itself, so I would guess that has something to do
with it too.

> [#59b6a2 8.2MiB/0B CN:1 DL:3.8MiB]
> 01/21 20:16:08 [NOTICE] Download complete: /private/tmp/git-master.zip
>
> Download Results:
> gid   |stat|avg speed  |path/URI
> ======+====+===========+=======================================================
> 59b6a2|OK  |   2.9MiB/s|/private/tmp/git-master.zip
>
> Status Legend:
> (OK):download completed.

There are others on the list who are better able to explain this than
me. But I'd guess the upshot is that 'git clone
https://github.com/git/git' is asking a Git server, which is good at
Git repo management (e.g. accepting pushes, generating packfiles to
send you a specific object or branch, etc) - but when you ask for
"git/git/archive/master.zip" you're getting the result of some work
the Git server already did a while ago to zip up the current 'master'
into an archive and give it to some other server.

We've done some other work[1] around enabling use of CDNs and prebuilt
chunks lately, but again, there are others on the list better able to
explain than me.

[1]: https://github.com/git/git/blob/master/Documentation/technical/packfile-uri.txt



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux