On Thu, Jan 21, 2021 at 7:01 PM William Chen <williamchen32335@xxxxxxxxx> wrote: > > Dear Emily, > > I see your excellent contribution to git clone. I hope that you are well. Hi William, this is a question much better directed at the Git list as a whole. > > When I try to clone a repo of a large size from github, it is slow. > > $ git clone https://github.com/git/git > ... > remote: Enumerating objects: 56, done. > remote: Counting objects: 100% (56/56), done. > remote: Compressing objects: 100% (25/25), done. > Receiving objects: 23% (70386/299751), 33.00 MiB | 450.00 KiB/s > > The following aria2c command, which can use multiple downloading threads, is much faster. Would you please let me know whether there is a way to speed up git clone (maybe by using parallelization)? In general, it would be more compelling to see actual numbers than "much faster", e.g. the outputs of `time git clone https://github.com/git/git` and `time aria2c https://github.com/git/git/archive/master.zip` - or even an estimation from you, like, "I think clone takes a minute or two but aria does the same thing in only a couple of seconds". "Much faster" means something different to everyone :) > > Your help is much appreciated! I look forward to hearing from you. Thanks. > > $ aria2c https://github.com/git/git/archive/master.zip > > 01/21 20:16:04 [NOTICE] Downloading 1 item(s) > > 01/21 20:16:04 [NOTICE] CUID#7 - Redirecting to https://codeload.github.com/git/git/zip/master Right here it looks like your zip download redirects to a CDN or something, which is probably better optimized for serving archives than the Git server itself, so I would guess that has something to do with it too. > [#59b6a2 8.2MiB/0B CN:1 DL:3.8MiB] > 01/21 20:16:08 [NOTICE] Download complete: /private/tmp/git-master.zip > > Download Results: > gid |stat|avg speed |path/URI > ======+====+===========+======================================================= > 59b6a2|OK | 2.9MiB/s|/private/tmp/git-master.zip > > Status Legend: > (OK):download completed. There are others on the list who are better able to explain this than me. But I'd guess the upshot is that 'git clone https://github.com/git/git' is asking a Git server, which is good at Git repo management (e.g. accepting pushes, generating packfiles to send you a specific object or branch, etc) - but when you ask for "git/git/archive/master.zip" you're getting the result of some work the Git server already did a while ago to zip up the current 'master' into an archive and give it to some other server. We've done some other work[1] around enabling use of CDNs and prebuilt chunks lately, but again, there are others on the list better able to explain than me. [1]: https://github.com/git/git/blob/master/Documentation/technical/packfile-uri.txt