Re: With big repos and slower connections, git clone can be hard to work with

ellie <el@xxxxxxxxxxx> · Sat, 8 Jun 2024 02:46:38 +0200

The deepening worked perfectly, thank you so much! I hope a resume will 
still be considered however, if even just to help out newcomers.

Regards,

Ellie

On 6/8/24 2:35 AM, rsbecker@xxxxxxxxxxxxx wrote:
On Friday, June 7, 2024 8:03 PM, ellie wrote:
Subject: Re: With big repos and slower connections, git clone can be hard to work
with

Thanks, this is very helpful as an emergency workaround!

Nevertheless, I usually want the entire history, especially since I wouldn't mind
waiting half an hour. But without resume, I've encountered it regularly that it just
won't complete even if I give it the time, while way longer downloads in the
browser would. The key problem here seems to be the lack of any resume.

I hope this helps to understand why I made the suggestion.

Regards,

Ellie

On 6/8/24 1:33 AM, rsbecker@xxxxxxxxxxxxx wrote:
On Friday, June 7, 2024 7:28 PM, ellie wrote:
I'm terribly sorry if this is the wrong place, but I'd like to
suggest a potential issue with "git clone".

The problem is that any sort of interruption or connection issue, no
matter how brief, causes the clone to stop and leave nothing behind:

$ git clone https://github.com/Nheko-Reborn/nheko
Cloning into 'nheko'...
remote: Enumerating objects: 43991, done.
remote: Counting objects: 100% (6535/6535), done.
remote: Compressing objects: 100% (1449/1449), done.
error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
CANCEL (err 8)
error: 2771 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output $ cd nheko
bash: cd: nheko: No such file or director

In my experience, this can be really impactful with 1. big repositories and 2.
unreliable internet - which I would argue isn't unheard of! E.g.
a developer may work via mobile connection on a business trip. The
result can even be that a repository is uncloneable for some users!

This has left me in the absurd situation where I was able to download
a tarball via HTTPS from the git hoster just fine, even way larger
binary release items, thanks to the browser's HTTPS resume. And yet a
simple git clone of the same project failed repeatedly.

My deepest apologies if I missed an option to fix or address this.
But summed up, please consider making git clone recover from hiccups.

Regards,

Ellie

PS: I've seen git hosters have apparent proxy bugs, like timing out
slower git clone connections from the server side even if the
transfer is ongoing. A git auto-resume would reduce the impact of that, too.

I suggest that you look into two git topics: --depth, which controls how much
history is obtained in a clone, and sparse-checkout, which describes the part of the
repository you will retrieve. You can prune the contents of the repository so that
clone is faster, if you do not need all of the history, or all of the files. This is typically
done in complex large repositories, particularly those used for production support
as release repositories.

Consider doing the clone with --depth=1 then using git fetch --depth=n as the resume. There are other options that effectively give you a resume, including --deepen=n.

Build automation, like Jenkins, uses this to speed up the clone/checkout.