RE: With big repos and slower connections, git clone can be hard to work with

<rsbecker@xxxxxxxxxxxxx> · Sun, 7 Jul 2024 21:27:52 -0400

On Sunday, July 7, 2024 7:42 PM, ellie wrote:
>I have now encountered a repository where even --deepen=1 is bound to be failing
>because it pulls in something fairly large that takes a few minutes. (Possibly, the
>server proxy has a faulty timeout setting that punishes slow connections, but for
>connections unreliable on the client side the problem would be the same.)
>
>So this workaround sadly doesn't seem to cover all cases of resume.
>
>Regards,
>
>Ellie
>
>On 6/8/24 2:46 AM, ellie wrote:
>> The deepening worked perfectly, thank you so much! I hope a resume
>> will still be considered however, if even just to help out newcomers.
>>
>> Regards,
>>
>> Ellie
>>
>> On 6/8/24 2:35 AM, rsbecker@xxxxxxxxxxxxx wrote:
>>> On Friday, June 7, 2024 8:03 PM, ellie wrote:
>>>> Subject: Re: With big repos and slower connections, git clone can be
>>>> hard to work with
>>>>
>>>> Thanks, this is very helpful as an emergency workaround!
>>>>
>>>> Nevertheless, I usually want the entire history, especially since I
>>>> wouldn't mind waiting half an hour. But without resume, I've
>>>> encountered it regularly that it just won't complete even if I give
>>>> it the time, while way longer downloads in the browser would. The
>>>> key problem here seems to be the lack of any resume.
>>>>
>>>> I hope this helps to understand why I made the suggestion.
>>>>
>>>> Regards,
>>>>
>>>> Ellie
>>>>
>>>> On 6/8/24 1:33 AM, rsbecker@xxxxxxxxxxxxx wrote:
>>>>> On Friday, June 7, 2024 7:28 PM, ellie wrote:
>>>>>> I'm terribly sorry if this is the wrong place, but I'd like to
>>>>>> suggest a potential issue with "git clone".
>>>>>>
>>>>>> The problem is that any sort of interruption or connection issue,
>>>>>> no matter how brief, causes the clone to stop and leave nothing behind:
>>>>>>
>>>>>> $ git clone https://github.com/Nheko-Reborn/nheko
>>>>>> Cloning into 'nheko'...
>>>>>> remote: Enumerating objects: 43991, done.
>>>>>> remote: Counting objects: 100% (6535/6535), done.
>>>>>> remote: Compressing objects: 100% (1449/1449), done.
>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>>> CANCEL (err 8)
>>>>>> error: 2771 bytes of body are still expected
>>>>>> fetch-pack: unexpected disconnect while reading sideband packet
>>>>>> fatal: early EOF
>>>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko
>>>>>> bash: cd: nheko: No such file or director
>>>>>>
>>>>>> In my experience, this can be really impactful with 1. big
>>>>>> repositories and 2.
>>>>>> unreliable internet - which I would argue isn't unheard of! E.g.
>>>>>> a developer may work via mobile connection on a business trip. The
>>>>>> result can even be that a repository is uncloneable for some users!
>>>>>>
>>>>>> This has left me in the absurd situation where I was able to
>>>>>> download a tarball via HTTPS from the git hoster just fine, even
>>>>>> way larger binary release items, thanks to the browser's HTTPS
>>>>>> resume. And yet a simple git clone of the same project failed repeatedly.
>>>>>>
>>>>>> My deepest apologies if I missed an option to fix or address this.
>>>>>> But summed up, please consider making git clone recover from hiccups.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Ellie
>>>>>>
>>>>>> PS: I've seen git hosters have apparent proxy bugs, like timing
>>>>>> out slower git clone connections from the server side even if the
>>>>>> transfer is ongoing. A git auto-resume would reduce the impact of
>>>>>> that, too.
>>>>>
>>>>> I suggest that you look into two git topics: --depth, which
>>>>> controls how much
>>>> history is obtained in a clone, and sparse-checkout, which describes
>>>> the part of the repository you will retrieve. You can prune the
>>>> contents of the repository so that clone is faster, if you do not
>>>> need all of the history, or all of the files. This is typically done
>>>> in complex large repositories, particularly those used for
>>>> production support as release repositories.
>>>
>>> Consider doing the clone with --depth=1 then using git fetch
>>> --depth=n as the resume. There are other options that effectively
>>> give you a resume, including --deepen=n.
>>>
>>> Build automation, like Jenkins, uses this to speed up the clone/checkout.

Can you please provide more details on this? It is difficult to understand your issue without knowing what situation is failing? What size file? Is this a large single pack file? Can you reproduce this with a script we can try?