Re: Is it possible to resume download on single file.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tao,

On Tue, 3 Sep 2024, tao lv wrote:

> Is it possible to resume a broken transfer on a per-file basis in Git?
> I am currently using git version 2.44.0.windows.1 (Git for Windows).
> Due to poor connection between my workplace and GitHub, cloning
> projects often fails.
> Therefore, I often operate through the following process: (the
> repository is chosen as git only for example)
>
> ```
> git clone --no-checkout --depth=1 --filter=blob:none https://github.com/git/git
> cd git
> git checkout HEAD -- "*"
> ```
>
> This allows me to download Git files one by one.
> However, sometimes a single file in the remote repository may be too
> large, causing even this method to fail to download all repository
> contents.
> So, I want to know if it is possible to resume transfers when
> downloading Git files? Or will this feature be added to git in the
> future?

Git's design does not allow for resumable clones at this time.

There are a few more or less hacky ways I can think of to address this
problem:

1) Prime the clone with a local "reference" repository

If one of your colleagues already has cloned the repository in question
successfully (and _not_ partially), and if both of you can access a local
network drive, then this colleague could initialize a bare repository on
said network drive that can then be used to accelerate future clones.

The reference repository would be initialized somewhat like this:

	cd /path/to/repository
	git clone --mirror --bare . /path/to/network/drive/repository

The clone operation would look somewhat like this:

	git clone --reference /path/to/network/drive/repository \
		--dissociate https://github.com/the/repository

2) Using bundle URIs

This assumes that your workplace has _some_ server that has both good
connection to GitHub as well as can be accessed reliably from your
workstation.

The idea is to maintain a set of static bundle files (for details, see
https://git-scm.com/docs/bundle-uri) that can not only be distributed via
Content Delivery Networks ("CDN") but also via servers that allow
resumable downloads. These bundles would take the role of the "reference
repository" in the previous bullet point.

One tool to make such a setup relatively easy to initialize and maintain
is this: https://github.com/git-ecosystem/git-bundle-server#readme

This does require a server that has reliable network connections to GitHub
(to update the bundles) and also to your workstation, and it requires a
bit of work to maintain.

3) HACK! Downloading the raw blob objects' contents

Since the problem seems to come from the blob objects, you could download
the raw file contents and feed them to `git hash-object -w --stdin` to
populate the local repository. That way, your partial clone would not need
to fetch them via the (non-resumable) `git fetch` protocol.

One way to download those raw contents is via the "Raw" URLs like
https://github.com/git/git/raw/v2.46.0/README.md. You can ensure that the
corresponding Git object is present in your local repository (and will
therefore not need to be fetched again) via something like this:

	curl -Lfs https://github.com/git/git/raw/v2.46.0/README.md |
	git hash-object -w --stdin

That can get a bit tedious if there are many blob objects that need to be
imported into the local repository; To counter that, you could download a
tar archive via something like https://github.com/git/git/tarball/v2.46.0
and then import it via something like `import-tars.perl` (see
https://github.com/git/git/blob/v2.46.0/contrib/fast-import/import-tars.perl).
According to this public information, the tar archives are cached and
therefore the download should be resumable:
https://docs.github.com/en/repositories/working-with-files/using-files/downloading-source-code-archives#stability-of-source-code-archives

I hope this information is useful to you. If you manage to improve your
situation, I would be curious to learn what worked for you.

Ciao,
Johannes





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux